DEV Community: yep

Active/Active Multi-region - Chat application Architecture

yep — Tue, 21 Apr 2026 13:06:41 +0000

In the previous post I covered how I connected two Kubernetes clusters across Mongolia and Germany using Netbird. That was the networking layer — pods can reach each other, DNS works across clusters. Now the interesting part: making the actual application work active/active across both regions.

Active/active means both clusters run independently and serve users, but a user on cluster A can chat with a user on cluster B in real time. No single point of failure, no "primary" region. Either cluster can go down and the other keeps running.

This breaks down into three problems: real-time events, chat history, and application state. Each one needs a different solution.

Part 1: Real-Time Events (NATS Super-Cluster)

For single-cluster WebSocket scaling I already use NATS — covered in an earlier post. The short version: all WebSocket servers publish and subscribe through NATS, so a message from a user on server A reaches a user on server B without those servers knowing about each other.

For multi-region, NATS has a concept called a super-cluster. You deploy independent NATS clusters in each region and connect them together. Messages published in one cluster eventually replicate to the other. "Eventually" here means milliseconds of extra latency — there are more network hops, but I accept that.

Setup is straightforward. Deploy NATS in each cluster using the operator (Helm chart), then configure the super-cluster by pointing each cluster at the other's gateway endpoints. After that, the application doesn't change at all. A backend in Germany subscribes to the same subjects as a backend in Mongolia. A message published in one region fans out to both. The application has no idea it's talking to a distributed system — it just publishes and subscribes like before.

This is the cleanest part of the whole setup. NATS was designed for this, and it shows. Example values.yaml

config:
  cluster:
    enabled: true
    replicas: 3
    merge:
      name: astring-fsn1

  gateway:
    enabled: true
    port: 7522
    merge:
      name: astring-fsn1
      gateways:
        - name: astring-mn
          urls:
            - nats://nats-mn-headless.nats.astring-mn.internal:7522

  monitor:
    enabled: true
    port: 8222
  merge:
    authorization:
      user: << $NATS_USER >>
      password: << $NATS_PASSWORD >>

container:
  env:
    NATS_USER:
      valueFrom:
        secretKeyRef:
          name: nats-auth-secret
          key: username
    NATS_PASSWORD:
      valueFrom:
        secretKeyRef:
          name: nats-auth-secret
          key: password

service:
  ports:
    nats:
      enabled: true
    gateway:
      enabled: true
    monitor:
      enabled: true

promExporter:
  enabled: true
  env:
    NATS_USER:
      valueFrom:
        secretKeyRef:
          name: nats-auth-secret
          key: username
    NATS_PASSWORD:
      valueFrom:
        secretKeyRef:
          name: nats-auth-secret
          key: password

reloader:
  enabled: true

natsBox:
  enabled: true

Part 2: Chat History (Cassandra, Not ScyllaDB)

I was using ScyllaDB. In December 2024 ScyllaDB moved from AGPL to a source-available license — the code is still public on GitHub, but running a cluster beyond a certain size requires a commercial license. ScyllaDB Manager (the tool for automation, repairs, and backups) is limited to 5 nodes on the free version. It's technically "open source" but not really anymore. I switched to Cassandra, which is fully open source under Apache 2.0 and has the same architecture.

For multi-region, Cassandra is actually the best-fit database I've worked with. Cassandra natively understands the concept of datacenters — your two sites aren't two separate clusters, they're two DCs in one logical Cassandra cluster. Replication is configured per-DC. Consistency levels let you decide per-query whether you need a local quorum (fast, single-region) or global quorum (slower but cross-region consistent).

For chat history, I use local quorum for reads and writes. Messages replicate to the other DC asynchronously. A user reading chat history gets it from their local DC — fast. Eventually the other DC catches up. For chat history this is fine — nobody needs sub-millisecond cross-region consistency for reading old messages.

For Kubernetes I use the k8ssandra-operator, which manages Cassandra clusters across multiple Kubernetes clusters. This is where it gets interesting: the operator needs to manage pods in both cluster-mn and cluster-de, which means it needs to reach both clusters. I deploy the k8ssandra-operator on a separate management cluster — a small single-node k3s cluster that reaches both application clusters through Netbird. The operator registers both clusters and treats them as two DCs in one Cassandra deployment.

If the management cluster goes down, the Cassandra cluster keeps running — the operator just can't make configuration changes until it comes back. Acceptable tradeoff. After the register the 2 clusters (use official doc, they have explained better), my cluster.yaml is

apiVersion: k8ssandra.io/v1alpha1
kind: K8ssandraCluster
metadata:
  name: astring
  namespace: k8ssandra-operator
spec:
  cassandra:
    serverVersion: "4.0.10"
    telemetry:
      mcac:
        enabled: false
      prometheus:
        enabled: true
    storageConfig:
      cassandraDataVolumeClaimSpec:
        storageClassName: openebs-hostpath
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
    config:
      cassandraYaml:
        listen_address: "0.0.0.0"
      jvmOptions:
        heapSize: 512M
    datacenters:
      - metadata:
          name: dc1
        k8sContext: astring-mn
        size: 3
      - metadata:
          name: dc2
        k8sContext: astring-fsn1
        size: 3
  stargate:
    size: 1
    heapSize: 512M

Part 3: Application Database (The Hard Part)

NATS and Cassandra were relatively clean. Postgres is where I spent most of my time.

Postgres stores users, rooms, OTPs, metadatas — all the relational data. The problem: Postgres has one primary at a time. All writes go to the primary, replicas are read-only. In a multi-region setup, if the primary is in Mongolia and a user in Germany does a login, that request either needs to cross the ocean to write (200ms penalty) or I need two primaries that stay in sync.

What I Looked At

TiDB / SurrealDB (TiKV-based)

These are impressive databases but built for low-latency interconnects — single region or multi-AZ with <10ms between nodes. Stretch them across continents and the distributed SQL magic collapses for three specific reasons:

TSO coordination latency. TiDB relies on a Placement Driver (PD) acting as a Timestamp Oracle (TSO) to assign globally ordered timestamps. While timestamp allocation is optimized (batched/pipelined), it still requires coordination with a leader. In a Mongolia–Germany setup, this introduces non-trivial latency before transaction execution, especially under high concurrency.

Raft + 2PC write latency. TiKV uses Raft consensus for replication and Percolator-style two-phase commit for distributed transactions. Writes require quorum acknowledgment, which in cross-region setups means at least one intercontinental round trip. Combined with 2PC coordination, end-to-end write latency can reach hundreds of milliseconds.

Scaling across regions. Adding more regions increases coordination overhead (more replicas, more quorum distance). These systems scale well within a region, but cross-region deployments require careful topology design and acceptance of higher write latency.

CockroachDB

CockroachDB has similar characteristics: Consensus-driven latency. CockroachDB also uses Raft for replication. Cross-region writes require quorum, so latency is bounded by inter-region round trips, similar to TiDB/TiKV.

Operational and licensing considerations. Recent versions have shifted licensing and feature availability. Advanced capabilities like geo-partitioning (which help localize data and reduce cross-region latency) are part of paid tiers. This introduces constraints for setups that require fine-grained data locality control without additional licensing.

YugabyteDB

This one I actually deployed and tested. YugabyteDB is Kubernetes-native, supports active/active replication through their xCluster feature, and the management UI is genuinely good — modern, clear, well-designed. I ran it on both clusters using their operator.

The xCluster setup works: deploy two independent YugabyteDB clusters, configure bidirectional xCluster replication between them. In theory, writes in Mongolia replicate to Germany and vice versa.

The dealbreaker: no DDL replication. Every time I add a table or alter a schema, I have to manually register each new table in the xCluster configuration. There's no automation for this in the open source version — I'd have to go into the dashboard, find the table ID, and add it manually every time. The UI for xCluster management is also rough. YugabyteDB Anywhere (their managed product) handles this properly, but that requires a license.

Here's the values I used — it works fine for single-cluster if you're interested:

Image:
  tag: 2025.2.1.0-b141

storage:
  master:
    count: 3
    size: 2Gi
    storageClass: openebs-hostpath
  tserver:
    count: 3
    size: 2Gi
    storageClass: openebs-hostpath

resource:
  master:
    requests:
      cpu: 0.5
      memory: 0.5Gi
    limits:
      cpu: 1
      memory: 1Gi
  tserver:
    requests:
      cpu: 0.5
      memory: 0.5Gi
    limits:
      cpu: 1
      memory: 1Gi

replicas:
  master: 3
  tserver: 3

partition:
  master: 3
  tserver: 3

domainName: "<zone>.internal"

And creating xCluster replication (run from inside a pod):

yb-admin \
  --master_addresses yb-master-0.yb-masters.yb.svc.astring-fsn1.internal:7100,... \
  setup_universe_replication \
  <replication_id> \
  yb-master-0.yb-masters.yb.svc.astring-mn.internal:7100,... \
  <table_id>

You get the table ID from the dashboard manually. As I said — not practical.

What I Actually Use: PgEdge + Spock

After going through all of that, I ended up with two independent Postgres clusters synchronized using logical replication via Spock — a Postgres extension that enables multi-master replication. PgEdge is a Helm chart built on top of CloudNativePG that packages Spock with a proper Kubernetes operator.

CloudNativePG is excellent — backup, restore, WAL archiving, high availability all work seamlessly. PgEdge adds Spock on top for the cross-cluster sync.

The architecture: two independent Postgres clusters (one per region), each a primary with replicas. Spock creates a logical replication subscription in each direction — cluster-mn subscribes to cluster-de, cluster-de subscribes to cluster-mn. Writes in either region replicate to the other asynchronously.

Helm values for each cluster:

pgEdge:
  appName: astring-cluster
  nodes:
    - name: n1
      hostname: astring-cluster-n1-rw
      clusterSpec:
        instances: 2
        enableSuperuserAccess: true
        postgresql:
          parameters:
            track_commit_timestamp: "on"
            wal_level: "logical"
        plugins:
          - name: barman-cloud.cloudnative-pg.io
            isWALArchiver: true
            parameters:
              barmanObjectName: r2-storage
  clusterSpec:
    storage:
      size: 1Gi
      storageClass: openebs-hostpath

After deploying both clusters, I run an initialization script that sets up the database, roles, and Spock nodes on each cluster:

#!/bin/bash

CONTEXTS=("astring-mn" "astring-fsn1")
DB_NAME="astring_prod"
NAMESPACE="pgedge"
POD_NAME="astring-cluster-n1-1"

for CTX in "${CONTEXTS[@]}"; do
    echo "--- Initializing: $CTX ---"

    POD_IP=$(kubectl get pod "$POD_NAME" --context "$CTX" -n "$NAMESPACE" -o jsonpath='{.status.podIP}')
    SUPER_PASS=$(kubectl get secret "astring-cluster-n1-superuser" --context "$CTX" -n "$NAMESPACE" -o jsonpath='{.data.password}' | base64 --decode)
    APP_PASS=$(kubectl get secret "astring-cluster-n1-app" --context "$CTX" -n "$NAMESPACE" -o jsonpath='{.data.password}' | base64 --decode)

    [[ "$CTX" == *"mn"* ]] && NODE_NAME="region_mn" || NODE_NAME="region_fsn1"
    DSN_HOST="astring-cluster-n1-rw.pgedge.svc.cluster.local"

    export PGPASSWORD=$SUPER_PASS

    psql -h "$POD_IP" -U postgres -d postgres -c "CREATE DATABASE $DB_NAME;" || true
    psql -h "$POD_IP" -U postgres -d "$DB_NAME" -c "
        ALTER ROLE app WITH REPLICATION;
        ALTER DATABASE $DB_NAME OWNER TO app;
        CREATE EXTENSION IF NOT EXISTS spock;
        GRANT USAGE ON SCHEMA spock TO app;
        GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA spock TO app;
        GRANT ALL PRIVILEGES ON ALL FUNCTIONS IN SCHEMA spock TO app;
    "

    psql -h "$POD_IP" -U postgres -d "$DB_NAME" -c "
        SELECT spock.node_create(
            node_name := '$NODE_NAME',
            dsn := 'host=$DSN_HOST port=5432 dbname=$DB_NAME user=app password=$APP_PASS'
        );
    "
done

unset PGPASSWORD

Then sync an initial data dump to make both clusters start from the same state, and set up bidirectional replication:

#!/bin/bash

C1_CTX="astring-mn"
C2_CTX="astring-fsn1"
NS="pgedge"
DB_NAME="production_db"

C1_HOST="postgres-primary.pgedge.astring-mn.internal"
C2_HOST="postgres-primary.pgedge.astring-fsn1.internal"

get_ip() { kubectl get pod "astring-cluster-n1-1" --context "$1" -n "$NS" -o jsonpath='{.status.podIP}'; }
get_pass() { kubectl get secret "$2" --context "$1" -n "$NS" -o jsonpath='{.data.password}' | base64 --decode; }

C1_IP=$(get_ip "$C1_CTX")
C2_IP=$(get_ip "$C2_CTX")
C1_SUP_PASS=$(get_pass "$C1_CTX" "astring-cluster-n1-superuser")
C2_SUP_PASS=$(get_pass "$C2_CTX" "astring-cluster-n1-superuser")
C1_APP_PASS=$(get_pass "$C1_CTX" "astring-cluster-n1-app")
C2_APP_PASS=$(get_pass "$C2_CTX" "astring-cluster-n1-app")

# FSN1 subscribes to MN
export PGPASSWORD=$C2_SUP_PASS
psql -h "$C2_IP" -U postgres -d "$DB_NAME" -c "
SELECT spock.sub_create(
    subscription_name := 'sub_to_region_mn',
    provider_dsn := 'host=$C1_HOST port=5432 dbname=$DB_NAME user=app password=$C1_APP_PASS'
);"

# MN subscribes to FSN1
export PGPASSWORD=$C1_SUP_PASS
psql -h "$C1_IP" -U postgres -d "$DB_NAME" -c "
SELECT spock.sub_create(
    subscription_name := 'sub_to_region_fsn1',
    provider_dsn := 'host=$C2_HOST port=5432 dbname=$DB_NAME user=app password=$C2_APP_PASS'
);"

unset PGPASSWORD

DDL syncs automatically — add a table in one cluster, it appears in the other. No manual table registration like YugabyteDB.

One important detail: with two independent primaries both generating IDs, you need to make sure sequences don't conflict. If both clusters auto-increment from 1, you get duplicate primary keys. Update the sequence ID & increment on both database or use uuid v7, snowflake ID.

Where It Stands

Both clusters run independently behind GSLB. Users are routed to the nearest region, so normal operations stay local — no cross-ocean round trips on the critical path. Within each cluster, data remains strongly consistent. Across regions, I accept eventual consistency and the small window where state may diverge (e.g., a newly created user that hasn’t replicated yet during a failover). (I’ll cover the GSLB setup and routing details separately.)

Real-time messaging flows through a NATS supercluster, chat history is replicated using Apache Cassandra’s multi–data center replication, and application-level state syncs through Spock.

Is this over-engineered for a chat app with 10 users? Yes.

The architecture scales in a straightforward way — adding a new region means deploying another cluster and integrating it into the existing messaging and replication topology, with the usual tradeoffs around replication lag and consistency.

ArgoCD manages all of this across all three clusters — application clusters and management cluster — through ApplicationSets. Maybe I’ll write about this later.

Multi region kubernetes cluster (Onprem/Public Cloud)

yep — Mon, 20 Apr 2026 06:11:33 +0000

A single Kubernetes cluster is manageable. You deploy things, they run, life is good. Things get complicated when you have clusters in multiple regions that need to talk to each other. This post covers why I needed multi-region, what I tried, what didn’t work, and how I got pod-to-pod connectivity between Mongolia and Germany using Netbird.

Why Multi-Region

The answer has two parts.

First: the Mongolia cluster is on-premise, and this infrastructure is unstable. Network switches fail, the DC has maintenance windows, things go down. With only one cluster, when it goes down, the app goes down. Even with 10 users, they deserve better.

Second: I wanted to build something complicated and solve problems I didn't actually have yet. Active/active multi-region means clients in different geographies connect to their nearest cluster — a user in Germany shouldn't be routing through Mongolia just to send a chat message. That's ~200ms round trip just to reach the server. For a chat app that latency is noticeable. I don't have German users yet. I don't have many users at all. But the architecture is ready for when the app explodes. Hope it’s gonna explode.

For the second cluster I chose Hetzner — cheap, stable, good network. Germany datacenter. I provision it with Terraform and run k3s with Cilium, same as Mongolia.

What I Looked At and Ruled Out

Stretched k3s Cluster (Single etcd)

k3s supports Tailscale natively, which means you could in theory run a single stretched cluster across two DCs — nodes in both Mongolia and Germany joining the same k3s cluster with a shared etcd.

The problem is etcd. It's quorum-based — a majority of nodes must agree on every write. With nodes split across two DCs and ~200ms latency between them, every etcd write waits for that round trip. That's not acceptable for a control plane. You'd also need an odd number of etcd nodes for quorum, which means either an unbalanced split between DCs or a third observer node somewhere. The complexity and latency made this a non-starter before even trying it.

Cilium Cluster Mesh

Cilium has a built-in multi-cluster feature called cluster mesh. It connects multiple clusters at the network level — services become reachable across clusters natively, policies apply across clusters, load balancing works across clusters. It's exactly what I wanted.

The requirement: cluster nodes must be directly reachable from each other. The Mongolia cluster is behind NAT — its nodes have private IPs, not public ones. Without a way to make those nodes reachable from Germany, cluster mesh won't work. I ruled this out before attempting the setup.

This is important to understand about the architecture: each cluster internally uses native Cilium networking, no VPN overhead. Pod-to-pod traffic within a cluster is fully native. Netbird only sits between clusters — it's the bridge layer, not the base layer.

The Solution: Netbird for Cross-Cluster Connectivity

Netbird is a WireGuard-based overlay network with a routing architecture that handles NAT traversal automatically. The key feature: you can configure routing peers that expose a private network to the Netbird network. Other Netbird peers can then reach that private network through the router, even if the network itself is behind NAT.

For my setup:

cluster-mn (Mongolia, behind NAT) — pod CIDR <MN_POD_CIDR>, service CIDR <MN_SVC_CIDR>
cluster-de (Germany, Hetzner) — pod CIDR <DE_POD_CIDR>, service CIDR <DE_SVC_CIDR>

Pod and service CIDRs must be different between clusters — if both use the same ranges, routing breaks.

Option 1: Netbird Agent on VMs (Rejected)

Install Netbird agent directly on each cluster's nodes, use those nodes as routing peers. Pods wanting to reach the other cluster go through a Cilium Egress Gateway to the VM running the agent.

This works. I tested it. But it means managing agents on VMs manually, making sure they stay running, handling updates. I didn't want that operational overhead. I'm already over-engineering enough things.

Option 2: Netbird Kubernetes Operator (What I Use)

Netbird has a Kubernetes operator that installs agents as pods inside the cluster. No VM management needed.

The problem: the operator's router pod doesn't use hostNetwork, so pods inside the cluster can't reach the Netbird network through it. The pod is isolated from the node's network namespace. Possible to use Sidecar, but for the cassandra and others it becomes more complicated.

The fix: patch the router deployment to enable hostNetwork: true after it's deployed. The Netbird Helm chart doesn't expose this as a configurable value, so I use a Kubernetes Job that runs post-deploy and patches the deployment:

apiVersion: batch/v1
kind: Job
metadata:
  name: patch-router-hostnetwork
  namespace: netbird
  annotations:
    argocd.argoproj.io/hook: PostSync
    argocd.argoproj.io/hook-delete-policy: BeforeHookCreation
spec:
  backoffLimit: 5
  template:
    spec:
      serviceAccountName: netbird-patcher
      restartPolicy: OnFailure
      containers:
        - name: patch
          image: bitnami/kubectl:latest
          command:
            - sh
            - -c
            - |
              echo "Waiting for router deployment..."
              kubectl rollout status deployment/router -n netbird --timeout=120s
              echo "Patching hostNetwork..."
              kubectl patch deployment router -n netbird --type=json -p='[
                {"op":"add","path":"/spec/template/spec/hostNetwork","value":true},
                {"op":"add","path":"/spec/template/spec/dnsPolicy","value":"ClusterFirstWithHostNet"}
              ]'
              echo "Done."

The Job is triggered by ArgoCD as a PostSync hook — it runs automatically after every sync, so the patch is always applied even after redeployments. I'll write a dedicated post on ArgoCD, but this is a good example of why it's useful — this whole setup is just config, no manual steps.

The router pod also needs to run on a specific node (the one that will act as the routing peer), so I label that node and add node affinity:

router:
  enabled: true
  replicas: 1
  nodeSelector:
    netbird-router: "true"

Cilium Egress Gateway

With the router pod running with hostNetwork: true on the designated node, I configure a CiliumEgressGatewayPolicy. Any pod in cluster-de wanting to reach <MN_POD_CIDR> (Mongolia's pod network) exits through the router node's wt0 interface (the WireGuard/Netbird interface):

apiVersion: cilium.io/v2
kind: CiliumEgressGatewayPolicy
metadata:
  name: netbird-egress
spec:
  selectors:
    - podSelector: {}
  destinationCIDRs:
    - <MN_POD_CIDR>
  egressGateway:
    nodeSelector:
      matchLabels:
        netbird-router: "true"
    interface: "wt0"

Important: the egress policy only covers the other cluster's pod CIDR, not Netbird IPs (100.115.x.x range). This is intentional. If you add an egress policy for the Netbird IP range, you create a routing loop — traffic arrives from a Netbird IP, the egress policy kicks in and tries to redirect it back through the router, and it never reaches the pod. By leaving Netbird IPs out of the egress policy, pods can't directly reach the Netbird network (minor downside), but the loop is avoided and cross-cluster traffic works correctly.

Enable egress gateway in Cilium's Helm values:

egressGateway:
  enabled: true

The same setup runs mirror-image on cluster-mn, with egress policy pointing at <DE_POD_CIDR>.

Cross-Cluster Service Discovery

Pod-to-pod connectivity works now, but pod IPs change. You can't exactly hardcode them. Kubernetes service IPs don't work across clusters either — Cilium uses virtual routing for service IPs, and a pod in cluster-de trying to reach a service IP from cluster-mn doesn't know how to resolve it. The solution is a bit of a DNS relay. We need to trick the pods into asking the other cluster for the right IP.

The Relay Logic

When a pod in Germany wants to find database.namespace.svc.astring-mn.internal, the request follows this path:

Local CoreDNS: Realizes it doesn't own .astring-mn.internal and forwards it to our "DNS Bridge."
DNS Bridge: A CoreDNS pod running on the hostNetwork of the router node. It listens on port 1053 and forwards the request over the Netbird tunnel.
Netbird Peer: The request travels through the WireGuard tunnel to a Netbird peer that can see the Mongolia cluster's internal API/DNS.
Resolution: The IP comes back, and the egress policy we set up earlier handles the actual data routing.

CoreDNS Custom Config

apiVersion: v1
kind: ConfigMap
metadata:
  name: coredns-custom
  namespace: kube-system
data:
  astring-fsn1.server: |
    astring-fsn1.internal:53 {
        rewrite name suffix .svc.astring-fsn1.internal .svc.cluster.local
        rewrite name suffix .astring-fsn1.internal .svc.cluster.local
        kubernetes cluster.local
    }
  astring-mn.server: |
    astring-mn.internal:53 {
      errors
      cache 30
      forward . <ROUTER_NODE_IP>:1053
    }

Queries for *.astring-mn.internal get forwarded to port 1053 on the router node. The router node runs a DNS bridge that forwards those queries into the other cluster's network via Netbird.

DNS Bridge: The middleman

The DNS bridge is a CoreDNS instance running on the router node with hostNetwork: true, forwarding queries to a Netbird peer IP:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dns-bridge-mn
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dns-bridge-mn
  template:
    metadata:
      labels:
        app: dns-bridge-mn
    spec:
      nodeSelector:
        netbird-router: "true"
      hostNetwork: true
      containers:
        - name: coredns
          image: coredns/coredns:1.10.1
          args: ["-conf", "/etc/coredns/Corefile"]
          volumeMounts:
            - name: config-volume
              mountPath: /etc/coredns
              readOnly: true
      volumes:
        - name: config-volume
          configMap:
            name: dns-bridge-conf
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: dns-bridge-conf
  namespace: kube-system
data:
  Corefile: |
    astring-mn.internal:1053 {
      errors
      cache 30
      forward . <NETBIRD_PEER_IP>
    }

<NETBIRD_PEER_IP> is a fixed Netbird IP — in my case, the management VM's Netbird agent. I chose this because I wanted a stable IP that doesn't change. Using another cluster's routing peer IP would also work. This is not ideal — ideally this would be dynamically resolved — but it works and I'll improve it later. Same applies to <ROUTER_NODE_IP> in the CoreDNS config above. Future me has a lot of work to do.

With this in place, a pod in cluster-de can reach a service in cluster-mn using service-name.namespace.svc.astring-mn.internal. CoreDNS forwards the query to the DNS bridge, the bridge forwards it through Netbird to the other cluster's DNS, gets the pod IP back, and the egress policy routes the traffic through the router node.

Managing All of This with ArgoCD

If you're doing this manually — applying patches, keeping configs in sync across two clusters, making sure the post-deploy job runs — it becomes a nightmare quickly. ArgoCD manages all of it declaratively. The patch job, the egress policies, the CoreDNS configs, the Netbird operator — all defined as ApplicationSets, applied automatically on sync. I'll cover ArgoCD properly in a separate post.

Current State

Pod-to-pod connectivity between cluster-mn and cluster-de is working. Services are resolvable across clusters using the custom CoreDNS setup. NATS supercluster and cross-DC ScyllaDB replication both run on top of this — that's a separate post covering the active/active chat architecture.

Known limitations I'll fix later:

<ROUTER_NODE_IP> in CoreDNS is hardcoded — should be dynamic
<NETBIRD_PEER_IP> using management VM is not ideal — should use a cluster routing peer directly

Is this over-engineered for a chat app with small number of users? Absolutely.

Securing Kubernetes with Wireguard & others

yep — Mon, 20 Apr 2026 05:51:15 +0000

As the cluster grew, I needed secure access to internal services — PostgreSQL, NATS, Redis, ScyllaDB — without exposing them to the public internet. I went through three solutions: WireGuard, Tailscale, and eventually Netbird. Each one taught me something.

The Problem

Early on I was forwarding TCP ports directly through the ingress controller:

tcp:
  "4222": nats/nats-cluster:4222
  "5432": pgo/astring-ha:5432
  "6379": redis/redis:6379
  "9042": scylla/scylla-client:9042

Each service has authentication, so it's not completely open. But exposing databases directly to the public internet is bad practice regardless. I wanted internal services reachable only through a private network, with only HTTP endpoints publicly accessible.

WireGuard

WireGuard is a modern VPN — lightweight, fast, minimal codebase compared to OpenVPN. The core idea: create an encrypted tunnel between your machine and the cluster. Once connected, internal services look like they're on your local network.

Setup on Kubernetes

I deployed WireGuard using Helm:

helm repo add wireguard https://bryopsida.github.io/wireguard-chart
helm repo update

helm install wireguard wireguard/wireguard \
  --namespace wireguard \
  --create-namespace \
  -f values.yaml

values.yaml:

service:
  enabled: true
  type: ClusterIP

wireguard:
  clients:
    - AllowedIPs: 10.34.0.2/32
      PublicKey: <client_public_key>

Expose UDP port 51820 through the ingress:

udp:
  "51820": wireguard/wireguard-wireguard:51820

Client Setup

Generate keys on your local machine:

wg genkey | tee privatekey | wg pubkey > publickey

Get the server's public key from the Kubernetes secret:

kubectl get secret wireguard-wireguard -n wireguard -o yaml | grep publicKey

Client config (wg0.conf):

[Interface]
PrivateKey = <your_private_key>
Address = 172.32.32.2/32
DNS = 10.43.0.10, 8.8.8.8

[Peer]
PublicKey = <server_public_key>
AllowedIPs = 10.0.0.0/16, 10.43.0.0/16, 172.32.32.0/24
Endpoint = <cluster_public_ip>:51820
PersistentKeepalive = 25

Bring the tunnel up: wg-quick up wg0

Done — internal services are now accessible through the tunnel, not exposed publicly.

WireGuard worked. But managing keys manually, updating the Helm values for each new client, and handling the configuration yourself gets old quickly. Also using PodIP was bad, it changes frequently. Then I found Tailscale.

Tailscale

Tailscale is WireGuard underneath, but with everything you'd otherwise build yourself already done — key management, device registration, DNS, access policies, a UI. You don't touch keys manually. You don't write config files. You just install it and devices appear in your network.

Kubernetes Integration

Tailscale has a Kubernetes operator. Install it:

helm repo add tailscale https://pkgs.tailscale.com/helmcharts
helm repo update

helm upgrade \
  --install \
  tailscale-operator \
  tailscale/tailscale-operator \
  --namespace=tailscale \
  --create-namespace \
  --set-string oauth.clientId=<client_id> \
  --set-string oauth.clientSecret=<client_secret> \
  --wait

Then expose a service to your Tailscale network by adding one annotation:

metadata:
  annotations:
    tailscale.com/expose: "true"

That's it. The service gets a Tailscale IP and a DNS name automatically. No UDP port forwarding, no ingress config, no manual key exchange.

What Made It Better

MagicDNS — every device and service on your Tailscale network gets a DNS name. Instead of remembering 10.43.x.x, you connect to postgres.tailnet-name.ts.net. Works automatically.

Access controls from the UI — you define which devices can reach which services through a policy file in the Tailscale dashboard. No YAML in the cluster, no firewall rules to manage manually.

Sharing — you can share specific services with other Tailscale accounts from the UI. Useful for giving someone temporary access without adding them to your cluster.

Multi-platform — Tailscale client runs on Linux, macOS, iOS, Android. Once installed, all your devices are on the same private network automatically.

If you're looking for a simple private networking solution for a Kubernetes cluster — accessing databases locally, connecting team members, securing internal services — Tailscale is the easiest path. I'd recommend it for most use cases.

I used Tailscale as my primary private network for a while and fully replaced WireGuard with it. But then my requirements changed.

Netbird: Why I Moved

Two things pushed me toward Netbird:

Self-hosting potential. Tailscale is a managed service. Your network topology goes through their coordination server. For now that's fine, but I want the option to fully self-host the control plane in the future. Netbird is fully open source — the management server, signal server, everything. I'm currently using Netbird cloud, but the option to move is there.

Multi-DC connectivity. The bigger reason. I have two clusters — Mongolia (on-premise) and Germany (cloud). I need them to communicate directly: NATS supercluster, Cassandra/ScyllaDB cross-DC replication, other services that need pod-to-pod reachability.

The k8ssandra-operator is a good example of why this is hard. It needs pods in one DC to directly reach pods in the other DC by their pod IPs. You can't just expose a LoadBalancer service and call it done — the operator needs the actual pod network to be routable between clusters.

Netbird handles this with its routing architecture. You configure network routes that make each cluster's pod CIDR reachable through Netbird peers. A Netbird peer in each cluster acts as a router for that cluster's network. Traffic between clusters flows through Netbird's encrypted tunnels, but pod IPs on both sides remain directly addressable.

This also works through NAT — the Mongolia cluster is behind NAT, the Germany cluster is on a public IP. Netbird's hole-punching handles the NAT traversal automatically. No static IPs required on the Mongolia side.

I tested this setup — pod-to-pod connectivity between Mongolia and Germany works. The full multi-DC architecture (NATS supercluster, cross-DC ScyllaDB replication) is a separate post.

Netbird on Kubernetes

Install the Netbird client as a DaemonSet or deployment in each cluster, register with your Netbird account, configure routes for the pod and service CIDRs. Each cluster becomes a peer on the Netbird network with routing configured for its internal networks.

The management UI lets you define access policies — which peers can reach which networks, which ports are allowed. Similar to Tailscale's access controls but fully open source.

Summary

WireGuard — solid foundation, full control, manual everything. Good if you want to understand what's happening underneath or have specific requirements that managed solutions don't cover.
Tailscale — WireGuard with all the operational work done for you. Best choice for most use cases: local development access, team access, securing internal services. I'd recommend this as the default.
Netbird — open source Tailscale alternative with better routing architecture for multi-cluster setups. Right choice when you need pod-to-pod reachability across clusters or want self-hosting as an option.

Current state: Netbird for everything. WireGuard and Tailscale are gone.

Persistent Storage in Kubernetes: From Longhorn to OpenEBS

yep — Mon, 20 Apr 2026 05:43:37 +0000

Stateful workloads need storage that outlives pods. In Kubernetes, that means Persistent Volumes (PV) and Persistent Volume Claims (PVC) — a PV is the actual storage, a PVC is a pod's request for it. Kubernetes matches them and handles the binding. The interesting question is what backs those PVs.

I started with Longhorn, realized it was too heavy for my cluster, benchmarked alternatives, and switched to OpenEBS. Here's the full story with numbers.

Longhorn: Good, But Overkill

Longhorn is easy to install and comes with a solid UI, snapshots, backups, and synchronous replication across nodes. I installed it with Helm:

helm repo add longhorn https://charts.longhorn.io
helm repo update

helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --version 1.7.1

It worked. But on a 3-node cluster with limited resources, Longhorn consumes around 1.5GB of memory just for its own components — Instance Manager, CSI plugins, Longhorn Manager, and the UI.

The bigger issue: my stateful apps (PostgreSQL, ScyllaDB) already handle their own replication. ScyllaDB replicates across nodes at the application level. PostgreSQL does the same. Adding storage-level replication on top is redundant — double the replication overhead, double the latency, for no benefit.

I set replicas to 1 to avoid redundant replication:

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-single-replica
provisioner: driver.longhorn.io
parameters:
  numberOfReplicas: "1"
  dataLocality: "best-effort"
reclaimPolicy: Delete
volumeBindingMode: WaitForFirstConsumer

Even with single replica, the 1.5GB memory overhead remained. For a small cluster where every GB matters, that's hard to justify.

Benchmarking the Options

Before switching, I ran proper benchmarks using FIO on my actual cluster — 3-node CentOS VMs, the same hardware running everything else.

FIO Pod

Same pod spec used across all three storage options, just swapping the PVC:

apiVersion: v1
kind: Pod
metadata:
  name: fio-test
spec:
  restartPolicy: Never
  containers:
    - name: fio
      image: ljishen/fio
      command: ["fio"]
      args:
        - --name=pg-test
        - --filename=/data/testfile
        - --size=200M
        - --bs=8k
        - --rw=randrw
        - --rwmixread=70
        - --ioengine=libaio
        - --iodepth=16
        - --runtime=60
        - --numjobs=1
        - --time_based
        - --group_reporting
      resources:
        requests:
          cpu: "1"
          memory: "256Mi"
        limits:
          cpu: "2"
          memory: "512Mi"
      volumeMounts:
        - mountPath: /data
          name: testvol
  volumes:
    - name: testvol
      persistentVolumeClaim:
        claimName: longhorn-pvc  # swap for local-pvc or openebs-pvc

The FIO config simulates a database-like workload — 8k block size, 70/30 read/write mix, random I/O, 16 queue depth.

Longhorn Setup

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: longhorn-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn-single-replica
  resources:
    requests:
      storage: 1Gi

Local PV Setup

More manual — create the directory on the node first:

sudo mkdir -p /mnt/disks/localdisk1
sudo chmod 777 /mnt/disks/localdisk1

Then create the PV and PVC manually:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: local-pv
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  storageClassName: local-storage
  local:
    path: /mnt/disks/localdisk1
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s3
  persistentVolumeReclaimPolicy: Delete
  volumeMode: Filesystem

When using local PVs, the pod also needs node affinity to land on the right node:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
        - matchExpressions:
            - key: kubernetes.io/hostname
              operator: In
              values:
                - k8s3

This is the problem with local PVs at scale — every new volume needs manual directory creation, a manually written PV manifest, and node affinity on every pod that uses it. No dynamic provisioning. Painful to manage.

Results

Metric	Longhorn	Local PV	OpenEBS
Read IOPS	811	7757	7401
Read Bandwidth	6.3 MiB/s	60.6 MiB/s	57.8 MiB/s
Read Latency (avg)	14,189 µs	1,467 µs	1,539 µs
Write IOPS	346	3328	3177
Write Bandwidth	2.7 MiB/s	26.0 MiB/s	24.8 MiB/s
Write Latency (avg)	12,913 µs	1,377 µs	1,440 µs
CPU Usage (sys)	4.71%	26.25%	26.05%
Memory Overhead	~1.5 GB	none	~180 MB
Backend	User-space	Kernel block device	Kernel block device

Longhorn's numbers are significantly worse — 10x higher latency, ~10x lower IOPS. That's the cost of going through a user-space storage layer for every I/O operation. Local PV and OpenEBS both go through the kernel block device directly, which is why they're close to each other.

Local PV wins on raw performance but loses on everything else — no dynamic provisioning, manual node affinity management, manual directory creation on each node. It doesn't scale.

OpenEBS: The Sweet Spot

OpenEBS with hostpath provisioner gives us performance close to local PV with actual automation. It handles provisioning, metrics, and lifecycle. Memory overhead is ~180MB for the whole stack — 8x less than Longhorn.

k3s has a built-in local-path provisioner that's similar, but it also requires manually creating directories on each node and gives less control over the storage lifecycle. OpenEBS handles that automatically.

Install:

helm repo add openebs https://openebs.github.io/openebs
helm repo update

helm install openebs --namespace openebs openebs/openebs \
  --set engines.replicated.mayastor.enabled=false \
  --create-namespace

-set engines.replicated.mayastor.enabled=false disables Mayastor, OpenEBS's replicated storage engine. I don't need it — my apps handle their own replication. Disabling it keeps the footprint small.

Create the base directory once on each node:

sudo mkdir -p /var/openebs/local

Then PVCs just reference the openebs-hostpath storage class:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: openebs-pvc
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: openebs-hostpath
  resources:
    requests:
      storage: 1Gi

No manual PV creation, no node affinity on pods, no directory management per volume. OpenEBS handles it.

Current State

Everything stateful on the cluster — PostgreSQL, ScyllaDB, Redis, NATS — uses OpenEBS with openebs-hostpath. Longhorn is gone.

Monitoring & Observability

yep — Mon, 20 Apr 2026 05:38:52 +0000

Metrics tell us something is wrong. Logs tell us why. We need both. This post covers how I set up the full observability stack for ASTRING — Prometheus and Grafana for metrics, Fluent Bit and Loki for logs, and Alertmanager.

Metrics: Prometheus and Grafana

Why Prometheus

Prometheus is the standard for Kubernetes monitoring. It scrapes /metrics endpoints from our services and stores everything as time series data. PromQL lets us query and aggregate across that data. It also handles alerting rules, which I'll get to later.

The easiest way to get the full stack running on Kubernetes is kube-prometheus-stack — it bundles Prometheus, Grafana, Alertmanager, and a set of pre-built dashboards and alerting rules for Kubernetes components.

Installing kube-prometheus-stack

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace prometheus \
  --create-namespace \
  --values values.yaml

A minimal values.yaml to get started:

grafana:
  adminUser: admin
  adminPassword: your_password
  service:
    type: NodePort
    nodePort: 30000

Once it's running, Grafana comes with pre-built dashboards for cluster health, node resource usage, pod performance, and Kubernetes component metrics. I actively use these — mostly for checking memory and CPU trends across the cluster.

Logs: Why Not ELK

The standard alternative to what I'm using is the ELK stack — Elasticsearch, Logstash, Kibana. It's powerful but heavy. Elasticsearch automatically creates indexes for everything it ingests, which means significant memory and CPU overhead even at low log volumes. On a 3-node cluster with limited resources, running Elasticsearch alongside everything else didn't make sense. It also adds Kibana as a separate UI, which means maintaining two dashboards.

Loki takes a different approach — it indexes only metadata (labels like pod name, namespace, container), not the full log content. The logs themselves are stored compressed in object storage. This makes it much lighter to run and cheaper to store. Since it's built by Grafana Labs, it integrates directly into Grafana as a data source — same dashboard for metrics and logs.

Fluent Bit runs as a DaemonSet on every node, tails container log files, and ships them to Loki. It's lightweight by design, built for high-throughput log forwarding without consuming much memory.

Setting Up Loki

Loki stores logs in S3-compatible object storage — I use Cloudflare R2.

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

helm install loki grafana/loki \
  --namespace logging \
  --create-namespace \
  -f values.yaml

The important parts of values.yaml:

deploymentMode: SingleBinary

loki:
  commonConfig:
    replication_factor: 1
  ingester:
    chunk_encoding: snappy
  querier:
    max_concurrent: 2
  schemaConfig:
    configs:
      - from: "2024-06-01"
        index:
          period: 24h
          prefix: loki_index_
        object_store: s3
        schema: v13
        store: tsdb
  storage:
    bucketNames:
      admin: <bucket_name>
      chunks: <bucket_name>
      ruler: <bucket_name>
    s3:
      accessKeyId: <access_key>
      secretAccessKey: <secret_key>
      s3: s3://<access_key>:<secret_key>@<r2_endpoint>/<bucket_name>
      s3ForcePathStyle: false
      insecure: false
    type: s3

singleBinary:
  replicas: 1
  resources:
    limits:
      cpu: 3
      memory: 3Gi
    requests:
      cpu: 2
      memory: 1Gi
  extraEnv:
    - name: GOMEMLIMIT
      value: 2750MiB

minio:
  enabled: false

SingleBinary mode runs everything in one pod — suitable for a small cluster. chunk_encoding: snappy compresses logs before storing them in R2, which reduces storage costs. GOMEMLIMIT caps Go's memory usage to stay within the pod's memory limit — same issue as GOMAXPROCS but for memory.

Setting Up Fluent Bit

Fluent Bit runs as a DaemonSet — one pod per node, tailing all container logs at /var/log/containers/*.log and forwarding to Loki.

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

helm install fluent-bit fluent/fluent-bit \
  --namespace logging \
  -f values.yaml

The important parts of values.yaml:

args:
  - -e
  - /fluent-bit/bin/out_grafana_loki.so
  - --workdir=/fluent-bit/etc
  - --config=/fluent-bit/etc/conf/fluent-bit.conf

config:
  inputs: |
    [INPUT]
        Name tail
        Tag kube.*
        Path /var/log/containers/*.log
        multiline.parser docker, cri
        Mem_Buf_Limit 5MB
        Skip_Long_Lines On

  outputs: |
    [Output]
        Name grafana-loki
        Match kube.*
        Url ${FLUENT_LOKI_URL}
        TenantID foo
        Labels {job="fluent-bit"}
        LabelKeys level,app
        BatchWait 1
        BatchSize 1001024
        LineFormat json
        LogLevel info
        AutoKubernetesLabels true

env:
  - name: FLUENT_LOKI_URL
    value: http://loki-gateway.logging.svc.cluster.local/loki/api/v1/push

image:
  repository: grafana/fluent-bit-plugin-loki
  tag: main-e2ed1c0

AutoKubernetesLabels true automatically attaches Kubernetes metadata (pod name, namespace, container name) as Loki labels — this makes filtering logs in Grafana much more useful. LabelKeys level,app promotes those specific fields into Loki stream labels, everything else becomes structured metadata.

Connecting Loki to Grafana

In Grafana, add Loki as a data source:

Go to Configuration → Data Sources → Add data source
Select Loki
Set URL to http://loki-gateway.logging.svc.cluster.local
Click Save & Test

Now logs are queryable in the Explore tab using LogQL, and we can build dashboards that combine metrics from Prometheus and logs from Loki in the same view.

Current State

The full observability stack running on the cluster:

Prometheus — scraping metrics from all services and Kubernetes components
Grafana — dashboards for cluster health, pod performance, and logs
Alertmanager — firing alerts to Telegram on pod crashes, high memory, and disk usage
Loki — storing logs in Cloudflare R2
Fluent Bit — collecting and forwarding logs from every node

I actively use Grafana for both metrics and logs. When something goes wrong, Telegram fires first, then I open Grafana to dig into what happened.

K3s, MetalLB & Cilium

yep — Thu, 16 Apr 2026 09:07:08 +0000

This post covers how I set up the on-premise Kubernetes cluster — picking a distribution, getting k3s running on CentOS, solving load balancing with MetalLB, and eventually replacing both MetalLB and the default CNI with Cilium.

Picking a Distribution

There are several lightweight Kubernetes distributions for on-prem setups like RKE, k0s, MicroK8s, and k3s. For a small cluster, I care mostly about simplicity and footprint. k3s fits that well — it’s a single binary under 100MB, easy to install, and doesn’t bring much overhead. It’s still stable enough for production use and handles multi-node setups without much complexity.

Setting Up k3s

Firewall First

Before installing anything, open the necessary ports on all nodes:

# Allow essential services
sudo firewall-cmd --permanent --add-service=ssh
sudo firewall-cmd --permanent --add-service=http
sudo firewall-cmd --permanent --add-service=https

# Trust pod and service networks
sudo firewall-cmd --permanent --zone=trusted --add-source=10.42.0.0/16  # Pods CIDR
sudo firewall-cmd --permanent --zone=trusted --add-source=10.43.0.0/16  # Services CIDR

# k3s-specific ports
sudo firewall-cmd --permanent --new-service=k3s
sudo firewall-cmd --permanent --service=k3s --set-description="K3s Firewall Rules"
sudo firewall-cmd --permanent --service=k3s --add-port=2379-2380/tcp  # etcd
sudo firewall-cmd --permanent --service=k3s --add-port=6443/tcp       # API server
sudo firewall-cmd --permanent --service=k3s --add-port=8472/udp       # Flannel VXLAN
sudo firewall-cmd --permanent --service=k3s --add-port=10250-10252/tcp # Kubelet
sudo firewall-cmd --permanent --service=k3s --add-port=30000-32767/tcp # NodePort
sudo firewall-cmd --permanent --add-service=k3s
sudo firewall-cmd --reload

Master Node

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.31.1+k3s1" sh -s - server \
  --cluster-init \
  --disable=traefik

--cluster-init initializes a new etcd-backed cluster. I disabled Traefik here because I use NGINX for ingress instead.

After installation, grab the node token for the worker nodes:

sudo cat /var/lib/rancher/k3s/server/node-token

Worker Nodes

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.31.1+k3s1" \
  K3S_TOKEN=<cluster_token> sh -s - server \
  --server https://<master_ip>:6443 \
  --disable=traefik

Note I'm running workers as server nodes, not agent nodes. This means all three nodes run the control plane — full HA with etcd across all nodes. If any one node goes down, the cluster keeps running.

Verify

kubectl get nodes

All nodes should show as Ready.

NGINX Ingress

With Traefik disabled, install NGINX for ingress:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.9.1/deploy/static/provider/cloud/deploy.yaml

Load Balancing: MetalLB

On cloud Kubernetes, creating a Service of type LoadBalancer automatically provisions a cloud load balancer. On-premise, Kubernetes has no implementation for this by default — services just sit in <pending> forever waiting for an external IP that never comes.

MetalLB solves this. It implements the LoadBalancer service type for bare-metal clusters using ARP at Layer 2 — when a service gets an IP from the pool, MetalLB announces it on the local network so traffic routes to the right node.

Install

kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.13.10/config/manifests/metallb-native.yaml

Configure

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: pool
  namespace: metallb-system
spec:
  addresses:
    - 10.20.30.100-10.20.30.105
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: l2-advertisement
  namespace: metallb-system
spec:
  ipAddressPools:
    - pool

kubectl apply -f metallb.yaml

The IP range 10.20.30.100-10.20.30.105 is the actual range on my network. Make sure the IPs you use aren't in your DHCP server's allocation range or assigned to anything else.

MetalLB worked well. But then I started reading about Cilium.

Replacing MetalLB and Flannel with Cilium

Cilium is a networking, security, and load balancing solution for Kubernetes built on eBPF — a Linux kernel technology that lets run code in kernel space safely, without kernel modules. The main draws for me were the eBPF angle (genuinely interesting technology), the security features, and the observability tooling (Hubble and Tetragon). It can also replace both the CNI (Flannel in k3s's case) and MetalLB, which simplifies the stack.

To be honest — at my current scale (small cluster, few users), I haven't noticed any measurable performance difference from the switch. I did this to learn and explore, not because I was hitting limits. But the observability alone has been worth it.

Reinstall k3s Without Flannel and kube-proxy

To use Cilium as the CNI, k3s needs to be installed without its default networking components. On the master node:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.31.1+k3s1" sh -s - server \
  --cluster-init \
  --disable=servicelb \
  --disable=traefik \
  --flannel-backend=none \
  --disable-network-policy \
  --disable-kube-proxy

On worker nodes:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.31.1+k3s1" \
  K3S_TOKEN=<token> sh -s - server \
  --server https://<master_ip>:6443 \
  --disable=servicelb \
  --disable=traefik \
  --flannel-backend=none \
  --disable-network-policy \
  --disable-kube-proxy

--flannel-backend=none removes the default CNI.
--disable-kube-proxy removes kube-proxy since Cilium replaces it.
--disable=servicelb removes k3s's built-in service load balancer.

Install Cilium — Basic First

I started with a basic Cilium install to make sure networking worked before adding L2 load balancing:

helm repo add cilium https://helm.cilium.io/
helm repo update

helm install cilium cilium/cilium \
  --namespace kube-system \
  --set kubeProxyReplacement=strict \
  --set k8sServiceHost=127.0.0.1 \
  --set k8sServicePort=6444

kubeProxyReplacement=strict tells Cilium to fully replace kube-proxy using eBPF instead of iptables for service routing.

Key Considerations for this Setup:

Localhost vs Node IP: Using 127.0.0.1 for k8sServiceHost is ideal if Cilium is running as a DaemonSet on a node where K3s is also running (like a single-node or small-scale HA setup), as it hits the K3s supervisor proxy directly.
How it works: When we point Cilium to 127.0.0.1:6444, it talks to the local K3s agent. This agent maintains a dynamic list of all available Master nodes.
Failover: If the current Master fails, the local proxy immediately reroutes Cilium’s traffic to a healthy Master.
The Benefit: Cilium remains connected to localhost, completely unaware of the backend failure. This ensures seamless networking uptime and removes the need for external infrastructure.

Add L2 Load Balancing

Once I confirmed everything was working, I upgraded to enable L2 announcements — Cilium's built-in equivalent of MetalLB:

helm upgrade cilium cilium/cilium --version 1.16.3 \
  --namespace kube-system \
  --set operator.replicas=1 \
  --set l2announcements.enabled=true \
  --set externalIPs.enabled=true \
  --set kubeProxyReplacement=strict \
  --set k8sServiceHost=127.0.0.1 \
  --set k8sServicePort=6444 \
  --set k8sClientRateLimit.qps=50 \
  --set k8sClientRateLimit.burst=100

Then configure the announcement policy — which interfaces Cilium uses to announce IPs:

apiVersion: cilium.io/v2alpha1
kind: CiliumL2AnnouncementPolicy
metadata:
  name: default-l2-announcement-policy
  namespace: kube-system
spec:
  nodeSelector: {}
  interfaces:
    - ens192
    - '^eth[0-9]+'
  externalIPs: true
  loadBalancerIPs: true

And the IP pool — same range I had in MetalLB:

apiVersion: cilium.io/v2alpha1
kind: CiliumLoadBalancerIPPool
metadata:
  name: default-pool
spec:
  blocks:
    - cidr: 10.20.30.100/29

kubectl apply -f cilium-l2-announcement-policy.yaml
kubectl apply -f cilium-load-balancer-ip-pool.yaml

Enable Hubble

Hubble is Cilium's observability platform — real-time network flow monitoring with a UI. I use it occasionally to debug traffic and see what's happening in the cluster:

helm upgrade cilium cilium/cilium \
  --namespace kube-system \
  --reuse-values \
  --set hubble.relay.enabled=true \
  --set hubble.ui.enabled=true

Current State

The cluster runs k3s on three CentOS VMs, all as server nodes with etcd for HA. Cilium handles CNI, kube-proxy replacement, and L2 load balancing. NGINX handles ingress. MetalLB is gone.

CI/CD, GitLab Pipelines and Kaniko

yep — Thu, 16 Apr 2026 08:38:36 +0000

CI/CD automates the build and deployment process — push code, pipeline runs, new version deployed on the cluster. Here's how I set it up for ASTRING using GitLab CI/CD, and why I ended up switching from Docker-in-Docker to Kaniko.

The Initial Pipeline

The first version used Docker-in-Docker (DinD) — a standard approach where the CI job spins up a Docker daemon inside a container to build the image.

stages:
  - dockerize
  - deploy

variables:
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

dockerize:
  stage: dockerize
  image: docker:24.0.5
  services:
    - docker:dind
  script:
    - docker build -t $IMAGE_TAG .
    - docker tag $IMAGE_TAG $CI_REGISTRY_IMAGE:latest
    - echo $CI_REGISTRY_PASSWORD | docker login -u $CI_REGISTRY_USER --password-stdin $CI_REGISTRY
    - docker push $IMAGE_TAG
    - docker push $CI_REGISTRY_IMAGE:latest

deploy:
  stage: deploy
  image:
    name: bitnami/kubectl:1.31
    entrypoint: [""]
  script:
    - mkdir -p ~/.kube
    - echo "$KUBECONFIG_BASE64" | base64 -d > ~/.kube/config
    - kubectl apply -f deployment/astring/deployment.yaml
    - kubectl apply -f deployment/astring/service.yaml
    - kubectl apply -f deployment/astring/ingress.yaml
    - kubectl rollout restart deployment/astring-backend -n astring

This worked locally but broke as soon as I moved the GitLab runner onto the on-premise Kubernetes cluster.

The Problem with DinD on k3s

My GitLab runner runs as a pod on the same k3s cluster that hosts everything else. k3s uses containerd as its container runtime, not Docker. DinD assumes a Docker daemon is available — it tries to talk to /var/run/docker.sock, which doesn't exist in a containerd environment. The build stage just failed.

Beyond the compatibility issue, DinD also requires privileged containers to run the nested Docker daemon, which is a security concern on a shared cluster.

Switching to Kaniko

Kaniko is a tool that builds container images from a Dockerfile without needing a Docker daemon. It runs entirely in userspace, reads the Dockerfile layer by layer, and pushes the result directly to a registry. No privileged container, no Docker socket, works fine on containerd.

Here's the updated pipeline:

stages:
  - dockerize
  - deploy

variables:
  IMAGE_TAG: $CI_REGISTRY_IMAGE:$CI_COMMIT_SHORT_SHA

dockerize:
  stage: dockerize
  image:
    name: gcr.io/kaniko-project/executor:v1.23.2-debug
    entrypoint: [""]
  script:
    - /kaniko/executor \
        --context "${CI_PROJECT_DIR}" \
        --dockerfile "${CI_PROJECT_DIR}/Dockerfile" \
        --destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_SHORT_SHA}" \
        --destination "${CI_REGISTRY_IMAGE}:latest" \
        --cache=true

deploy:
  stage: deploy
  image:
    name: bitnami/kubectl:1.31
    entrypoint: [""]
  script:
    - mkdir -p ~/.kube
    - echo "$KUBECONFIG_BASE64" | base64 -d > ~/.kube/config
    - kubectl apply -f deployment/astring/deployment.yaml
    - kubectl apply -f deployment/astring/service.yaml
    - kubectl apply -f deployment/astring/ingress.yaml
    - kubectl rollout restart deployment/astring-backend -n astring

The dockerize stage now uses the Kaniko executor image directly. No services block, no Docker socket, no privileged mode needed. --cache=true reuses unchanged layers across builds, which makes subsequent builds significantly faster.

How the Deploy Stage Works

The deploy stage uses kubectl to apply the manifests and trigger a rolling restart. It’s for the only testing purpose.

Kubeconfig is stored as a base64-encoded GitLab CI variable (KUBECONFIG_BASE64). The pipeline decodes it at runtime and writes it to ~/.kube/config.

Image tagging uses the commit SHA ($CI_COMMIT_SHORT_SHA) so every build produces a uniquely tagged image. The deployment manifest references the SHA tag, not latest — this ensures kubectl rollout restart actually pulls the new image. If we use latest without imagePullPolicy: Always, Kubernetes may skip the pull and restart with the cached image, which is not what we want.

Rolling restart means Kubernetes updates pods one at a time — new pod comes up, old pod goes down. Zero downtime deployment without any extra configuration.

Secrets

Config files and keys don't go into the image. They're stored as Kubernetes secrets and mounted into the pod at runtime. The pipeline itself only needs registry credentials and the kubeconfig — both stored as GitLab CI variables, never in the repo.

What's Next

This pipeline handles the backend. I'll write about the full cluster setup — k3s configuration, networking, ingress, and how everything is organized — in the next post.

Front-End & Struggles

yep — Fri, 10 Apr 2026 13:43:31 +0000

I didn’t have much frontend experience. This post covers the struggles I ran into.

Starting Point: React + TypeScript + Plain CSS

React with TypeScript felt like the obvious choice — popular, good ecosystem, type safety. I started writing plain CSS modules for styling. Full control, right?

The problem wasn't the code. It was that I had no design vision. I'd write a component, look at it, and know it looked bad but not know how to fix it. What colors? How much padding? How should this align? I couldn't answer these questions. The feedback loop was: write code → look bad → feel stuck → repeat. This went on for weeks.

Tailwind CSS Didn't Solve the Real Problem

I switched to Tailwind CSS thinking it would help. It sped things up — utility classes are fast to write, no context switching between files. But Tailwind is a tool for people who already know what they want to build. It doesn't give you design vision, it just makes it faster to execute one. I still didn't know what I wanted.

I tried Figma. I couldn't make anything that looked good there either. The problem wasn't the tools.

I restarted the project multiple times during this period — changing UI approach, structure, and direction. It felt like progress but mostly wasn't. This lasted around 3-4 weeks. Eventually I accepted that I wasn't going to figure out design from scratch and looked for a component library.

shadcn/ui

I looked at Material UI and Ant Design. Both felt heavy and opinionated in ways that would fight me later. I wanted something that:

Looked good out of the box
Integrated with Tailwind
I could actually own the code

shadcn/ui fit all of this. The key difference is that components are generated into your codebase rather than installed as a dependency. This gives full control over behavior and styling, without being constrained by a library’s abstraction. That turned out to matter a lot later when I needed to adapt components for React Native.

Next.js: Tried It, Left It

Around this time I migrated to Next.js. SSR, SSG, the whole thing. After a while I realized most of my components were client-side anyway. ASTRING is a chat app — it's dynamic content fetched after load, not static pages that benefit from SSR. I moved back to React with Vite. Faster dev server, simpler setup, no framework fighting back.

State Management: Jotai

Chat state is genuinely complex — active rooms, message lists, unread counts, real-time updates coming in from WebSocket, user presence. Redux felt like too much ceremony for this. Context API caused re-render problems as state got more interconnected.

Jotai worked well. Atoms are simple to create, updates are granular, and the mental model maps cleanly to "this piece of state, these components that care about it." Chat state in particular became much cleaner — each room's state is an atom, components subscribe only to what they need.

React Native: Why I Left Ionic

I built the mobile version with Ionic React first. Code reuse from the web was easy. But Ionic started showing limits for a chat app specifically — animations felt off, native components were lacking, the "native feel" wasn't there. Chat apps have specific UX expectations: smooth scrolling through message history, keyboard handling, swipe gestures, native-feeling transitions. Ionic's web-based approach couldn't deliver these well enough.

I moved to React Native with Expo. Expo makes the setup significantly easier — no manual native build configuration, good tooling, OTA updates.

Current Setup: Monorepo with pnpm Workspaces

Moving to React Native meant I had two apps — web (React + Vite) and mobile (React Native + Expo). Rather than duplicate code, I set up a monorepo with pnpm workspaces. Nothing fancy.

Shared packages:

Services — business logic
API clients — all backend communication
Jotai atoms — shared state: rooms, chats, user cache

Web and mobile each have their own UI layer, but everything underneath is shared. State is defined once — a room list atom, a message cache atom, a user presence atom — and both platforms consume the same atoms. This means a bug fix or API change happens once, and state behavior is consistent across platforms.

For styling on React Native I use NativeWind — Tailwind for React Native. Same utility classes I use on web, works on mobile. Makes the styling consistent and fast.

shadcn/ui on React Native — since shadcn components live in my codebase, I adapted them for React Native manually. div → View, button → Pressable, CSS → NativeWind classes. It's tedious but straightforward, and the result is consistent components across both platforms without two completely separate design systems.

Where It Stands

The code is messy in places. The UI is functional but not something I'm proud of visually. The monorepo structure is working well. Mobile is better than Ionic was for the specific things chat needs.

The main thing I learned: frontend is a different skill set. I kept thinking better tools would solve a design problem. They didn't. What helped was accepting I needed pre-built components, adapting them rather than fighting them, and not over-engineering the state management.

From Fly.io to On-Premise Kubernetes

yep — Thu, 09 Apr 2026 13:39:56 +0000

Everything works in localhost. Exposing it to the internet is a different problem. I went through Fly.io, Linode managed Kubernetes, and eventually landed on an on-premise cluster. Each step had tradeoffs in both cost and operational complexity.

Containers and Kubernetes

Before getting into the details, here is the short explanation.

A container is a lightweight, isolated unit that packages an application with its runtime and dependencies. Unlike virtual machines, containers share the host OS kernel, which makes them efficient in terms of startup time and resource usage. Docker is the standard tooling: define a Dockerfile, build an image, and run it across environments with minimal variation.

The problem: once we have multiple containers across multiple machines, managing them manually is chaos. Which machine runs what? What happens when a container crashes? How do you roll out updates without downtime?

Kubernetes solves this. It's an orchestration platform — you describe what you want (3 replicas of this service, always keep them running, expose them on this port) and Kubernetes figures out how to make it happen. The key building blocks:

Pod — the smallest unit, one or more containers running together
Deployment — describes how many pods to run and how to update them
Service — a stable network endpoint that routes traffic to the right pods
Ingress — routes external HTTP traffic into the cluster

The big win: if a pod crashes, Kubernetes restarts it. If a node goes down, it reschedules pods elsewhere. You stop thinking about individual machines and start thinking about desired state.

Phase 1: Fly.io + Vercel

For the backend I started with Fly.io. Easy to deploy, cheap during development — I stayed under their $5 threshold. For the frontend, Vercel. Push to GitLab, it deploys automatically. Vercel still handles the frontend today, no complaints. Fly.io manages container just fine.

Fly.io was fine for stateless services. The problem was stateful ones — ScyllaDB and NATS.

Running stateful services on Kubernetes properly requires operators — controllers that understand the specific lifecycle of a piece of software. ScyllaDB has its own operator that handles cluster bootstrapping, repairs, scaling, backup, topology changes. NATS has one too. On platforms like this, running Kubernetes operators isn’t possible because you don’t have access to a Kubernetes control plane. As a result, lifecycle management for stateful systems must be handled manually. You're stuck managing stateful services manually. I spent more time managing the platform than building the product.

Phase 2: Linode Managed Kubernetes

I needed real Kubernetes. Linode offered $100 credit on signup, which was enough to experiment properly.

The setup: 3 worker nodes (1 CPU, 2GB RAM, 50GB storage each) plus a load balancer. Linode's managed Kubernetes is free for the control plane — we only pay for nodes and networking.

3 nodes × $12/month = $36
Load balancer = $10/month
Total: $46/month

I used Terraform with Linode's provider to provision the cluster — infrastructure as code, version controlled, easy to redeploy. Once the cluster was up, I could run operators properly. ScyllaDB and NATS behaved the way they were supposed to.

When the $100 credit ran out, $46/month was hard to justify for a project still in testing. So I started thinking about on-premise.

Phase 3: On-Premise Kubernetes

The same teacher from my IOI days - gave me access to three VMs on his company's infrastructure. Free. Each machine had 8 cores, 8GB RAM, and 50GB storage.

I set up my own Kubernetes cluster on these using k3s — a lightweight Kubernetes distribution that works well for on-premise and resource-constrained environments. I'll write a dedicated post on the k3s setup, but the short version: it's full Kubernetes without the overhead, and it runs fine on these VMs.

Full control over the environment. I can run any operator, configure networking however I need, no platform restrictions. The tradeoff is that there's no managed control plane — if something breaks at the infrastructure level, I fix it myself. It’s acceptable also free just some of my time.

I deployed everything: ScyllaDB cluster, NATS, PostgreSQL, Redis, the backend services. All running on three VMs, costing nothing.

Where Things Stand

Frontend: Vercel, still works great
Backend + all services: On-premise k3s cluster

On-premise infrastructure requires more operational effort but provides full control and effectively zero cost when hardware is available. Next I'll write about the actual k3s setup — how I configured the cluster, networking, storage, and got everything running.

Object Storage & CDN Journey

yep — Thu, 09 Apr 2026 13:18:38 +0000

A chat application needs reliable object storage — media uploads, backups, logs. Sounds simple, but there’s lot of choices. I went through six different solutions before landing on something that actually made sense.

The S3 API

Before getting into the journey, one thing worth explaining: almost every object storage provider today implements the S3 API — the interface originally built by AWS for their Simple Storage Service.

It's a RESTful interface: buckets as containers, objects accessed by unique keys, HTTP methods for everything. The key thing is it's a standard. Providers like Wasabi, MinIO, Backblaze, Cloudflare R2 — they all speak S3. That means I can swap providers without rewriting application logic, just change the endpoint and credentials. That portability matters a lot when you're still figuring out the right fit.

The Provider Journey

AWS S3

The obvious starting point. Reliable, feature-rich, integrates with everything. I used it early on and it worked fine — but the pricing model is higher than others. I stopped using it before things got expensive.

Backblaze B2

Backblaze B2 has egress-free pricing, which sounds great. The problem: it only has American data centers. My servers and users aren't in America, so the latency was noticeable and unacceptable for a real-time chat app.

Tigris (via Fly.io)

Tigris (Fly.io) provides globally distributed, S3-compatible storage with low latency, addressing the B2 latency limitations. However, its pricing model includes per-request charges in addition to storage. For an API-heavy workload like a chat system, this would scale poorly, so I decided not to go with it.

MinIO

I actually deployed MinIO in my cluster. It's open-source, S3-compatible, and simple to run. But running it yourself means managing infrastructure, handling high availability, paying for the compute. For a small project it's overkill — I was spending more time on storage ops than on the actual product.

Wasabi

Wasabi has egress-free pricing and good performance. I settled here for a while. But there's a catch: Wasabi doesn't support public bucket permissions.

For private files, that's fine — I generate pre-signed URLs from the backend, the user gets a temporary link, no credentials exposed. But for public files like profile pictures, I had to build a backend service to forward them to users. Extra latency, extra backend load, not ideal.

I made it work, but then realized a bigger problem.

The Wasabi Pricing Problem

Wasabi charges for a minimum of 1TB regardless of how much you actually store. My total data — user uploads, database backups, cluster backups — was under 10GB. I was paying $8/month to store 10GB. That's bad.

Fixing the Public File Problem First

Before I figured out the pricing issue, I spent time solving the public file latency problem with Cloudflare caching. Worth documenting because it works well if you're stuck on Wasabi or something like this.

The setup: every public file request goes through my backend at /api/v1/media/file/*. I set Cloudflare cache rules on that path — mark responses eligible for cache, force an edge TTL of 1 year, bypass backend Cache-Control headers. Once a file is cached at Cloudflare's edge, it never hits my backend or Wasabi again.

Here's a real cached response:

CF-Cache-Status: HIT — served from Cloudflare's edge, not my backend
Age: 774 — seconds it's been cached at the edge
Cache-Control: max-age=31536000 — browser caches it for 1 year too

Zero extra cluster resources, no Wasabi bandwidth on repeat requests, low latency globally. If you're using Wasabi and hitting this problem, this approach works.

Because of the fixed fee, i decided to move anyway.

Final Setup: Cloudflare R2

Cloudflare R2 has a free tier of 10GB. My entire dataset fits in that. No egress fees, native CDN built in — so no need for the Cloudflare caching workaround above (though good to know it works). I moved everything to R2 and now pay nothing for storage.

For backups, I'm keeping Backblaze B2 in mind for when data grows — egress-free and cheap for large volumes, as long as the latency to my users is acceptable for backup use cases (it is).

Current state:

Cloudflare R2 — user uploads, all active data, everything under 10GB (free tier)
Backblaze B2 — future home for backups once R2 free tier isn't enough

The egress-free advantage of Wasabi turned out to be irrelevant at my scale. Under 1TB, you're just paying the minimum anyway. R2's free tier made the decision easy

Finding Rigth Database

yep — Wed, 08 Apr 2026 14:04:20 +0000

Most applications need to persist state. In a chat application, that state is massive, constantly growing, and high-frequency. The obvious starting point is a traditional RDBMS — but the specific access patterns of a real-time chat system eventually force a rethink.

The Problem with RDBMS for Chat

I could use PostgreSQL for storing messages. It works, until it doesn't.

Chat is different from most relational data. Messages don't join to other tables. What I actually need is simple: insert a message, fetch messages by room or user. That's it. So the requirements are:

It grows fast — millions, then billions of rows
No joins needed — just "give me all messages for room X"
Reads and writes need to be fast

Traditional databases like PostgreSQL and MySQL weren't designed with this access pattern as the primary use case. Here's why that matters.

Partitioning

As the message table grows, we can partition it — split it into smaller physical chunks based on some key, like room ID or time range. The database only scans the relevant partition instead of the whole table. Postgres supports this natively, but it handles it differently from distributed systems — partitions still live on a single machine, so we’re organizing data, not distributing the load.

The Write Scaling Problem

The bigger issue is writes. PostgreSQL and MySQL use a single-master model — one node handles all writes, replicas handle reads. Every message sent goes through that one master. At high write volume, that becomes bottleneck.

The common solution is sharding: split data across multiple independent database instances, each owning a slice. Hash the room ID to decide which shard it lives on. In theory, clean. In practice, painful — managing shard keys, handling rebalancing when nodes are added, cross-shard queries becoming a nightmare. I decided early on to avoid this entirely by choosing a database built for it natively.

Cassandra and ScyllaDB

This is where wide-column stores like Cassandra — and its C++ reimplementation, ScyllaDB — come in. Same architecture, ScyllaDB just rewrote it in C++ for better performance and lower latency.

The core idea: instead of one master handling writes, Cassandra/ScyllaDB uses a ring topology. Every node in the cluster owns a range of a hash space. When a message is written, the room ID gets hashed and routed to the node that owns that hash range. No single master, no write bottleneck — every node can accept writes.

Replication works naturally on top of this. With a replication factor of 3, a write doesn't just go to the primary node — it also goes to the next 2 nodes on the ring. So there are 3 copies of the data across different nodes. If one goes down, the data is still there. No manual failover, it's built into how the ring works.

The other key advantage is the partition key. By using room ID as the partition key, Cassandra/ScyllaDB guarantees all messages for that room are stored together on the same node. Pair that with a clustering key on timestamp, and messages within a room are physically stored in time order — fetching history becomes one sequential read, already sorted. No ORDER BY, no extra cost.

This turns random I/O into sequential I/O. Fetching chat history means finding the right node and reading one continuous stream. That's a hardware-level optimization that a single-master Postgres setup simply can't match at scale.

The tradeoff: Cassandra/ScyllaDB is bad at full scans and joins, because those require hitting every node. But based on the requirements here, that doesn't matter — joins are never needed.

This isn't just theory. Discord went through this exact problem — first scaling with Cassandra for billions of messages, then eventually migrating to ScyllaDB for better performance at trillions of messages. Worth reading if you want a production-scale perspective:

I'll write a dedicated post on Cassandra/ScyllaDB internals — replication strategies, consistency levels, and multi-DC support deserve their own space.

The Hybrid Architecture

There's no perfect database. Different tools solve different problems, so I use both:

PostgreSQL — relational, "small" data: users, friend lists, room metadata. Needs ACID compliance and complex queries, but doesn't grow at a massive rate.
Cassandra/ScyllaDB — the heavy data: every message ever sent. High-write throughput, fast sequential reads by room, horizontally scalable without a single write master.

Each database does what it's actually good at.

What's Next

There's more to cover here — consistency models, high availability, failover, and distributed systems fundamentals like Raft. I'll get into those in future posts. For now, this is the architectural reasoning behind the storage layer in ASTRING.