DEV Community

Aonnis
Aonnis

Posted on

Solving the 1MiB ConfigMap Limit in Kubernetes

1MiB ConfigMap limit in Kubernetes

If you have built a Kubernetes Operator in 2025, you eventually hit the "State Problem."

You start simple: storing configuration in ConfigMaps. It works perfectly until it doesn't. Perhaps you are managing a database cluster, and the cluster topology data grows. Suddenly, you hit the 1 MiB limit of Kubernetes ConfigMaps. Splitting data across multiple ConfigMaps becomes a nightmare of race conditions and unmanageable YAML.

You need a durable, writable store that is accessible by all replicas of your operator.

In this article, we explore how to move beyond ConfigMaps by embedding a distributed, Raft-based SQLite database directly into your Go operator. We will cover the architecture, resource overhead, and provide a complete code example.

The Challenge: 3 Nodes, 1 State

Imagine you are running a high-availability Operator deployment with 3 replicas to ensure leadership election and fault tolerance.

If you just write to a local file system or sqlite.db file on disk:

  • No Sync: Node A writes data, but Node B and Node C never see it.
  • Data Loss: If Node A crashes and gets rescheduled, the local file is lost (unless you use PVs, but even then, the new pod might not get the old volume).
  • Corruption: You cannot simply mount a shared file system (like NFS) and have three SQLite instances write to it simultaneously. SQLite locks will fight, and the database will likely corrupt as SQLite do not support concurrent writes.

We need a solution that is durable, synchronized, and lightweight.

Core Requirements for Operator State

Before looking at tools, we must define what a robust operator state store requires in a Kubernetes environment:

  • Strong Consistency: When managing infrastructure (like a database cluster), two replicas cannot have different views of the truth. We need a system that ensures all nodes agree on the state before proceeding.
  • High Availability: The store must survive the loss of a pod. In a 3-node setup, the system should remain fully operational even if one node is down.
  • Minimal Footprint: Kubernetes operators often run in resource-constrained environments. The database should not require massive CPU or RAM overhead that eclipses the operator's actual logic.
  • Zero-Dependency Architecture: Ideally, the solution should not require an external dependency service (like a managed database) or a complex sidecar. Adding external components increases the complexity and edge cases that needs to be solved. A self-contained binary simplifies the lifecycle management and reduces networking overhead.
  • Relational Capabilities: While Key-Value stores are common, having the ability to perform SQL joins and complex queries on cluster metadata significantly simplifies operator logic.

The Landscape of Solutions

Before writing custom code, we evaluated the standard architectural patterns for this problem.

1. The Sidecar Approach (rqlite / LiteFS)

You can run a database process alongside your operator container.

  • rqlite: A distributed database that uses SQLite as its engine. It uses HTTP for queries and handles Raft consensus for you.
  • LiteFS: A FUSE-based file system that replicates SQLite files across nodes by intercepting writes.

Verdict: While robust, sidecars introduce "lifecycle entanglement." You must ensure the sidecar is healthy before the operator starts, handle local network latency between containers, and manage double the resource requests/limits per pod. It also complicates kubectl logs and debugging as you're monitoring two distinct processes per replica.

2. The "Kubernetes Native" Approach (etcd)

K8s uses etcd, so why shouldn't you?

Verdict: Using the cluster's internal etcd (via the K8s API) brings you back to the 1MiB limit per object and strict rate limiting. Running your own etcd cluster inside the operator’s namespace is an option, but etcd is notoriously sensitive to disk latency and requires significant "babysitting" (backups, defragmentation, and member management). Furthermore, you lose the ability to perform relational queries, forcing you to implement complex indexing in your Go code.

3. External Database Service (Managed RDS / Self-hosted Postgres)

You could connect the operator to an external database like PostgreSQL or MySQL.

Verdict: This moves the state outside the cluster's blast radius, but introduces significant networking hurdles. You must manage VPC peering, Subnet routing, and IAM roles or Kubernetes Secrets for credentials. If the operator is running in a restricted environment (like an air-gapped cluster), an external DB might be physically unreachable. Additionally, the latency of a cross-network SQL query can slow down the reconciliation loop compared to a locally-embedded store.

4. The Embedded Approach (Go + Raft + SQLite)

Since Kubernetes Operators are typically written in Go, we can embed the distribution logic directly into the binary using libraries that integrate Raft consensus with the SQLite driver.

Verdict: This solution fits perfectly given the requirements. It creates a single, self-healing binary that manages its own replication. There are no extra containers to patch, no external credentials to rotate, and it leverages the same Persistent Volumes already assigned to the operator pods.

The Solution: Embedded Raft Consensus

We chose an approach using an embeddable library (like Hiqlite or Dqlite) that bundles:

  • SQLite: For SQL storage.
  • Raft: For consensus (ensuring all 3 nodes agree on the data).
  • HTTP/TCP Transport: To replicate logs between nodes.

How it handles "Simultaneous" Writes

A common concern is concurrency. If operator Node A manages "Cluster X" and operator Node B manages "Cluster Y", and they write simultaneously, what happens?

Distributed SQLite utilizes Serialized Writes. Even if requests come in parallel, the Raft Leader ingests them, orders them in a log, and applies them sequentially.

  • Throughput: While this sounds slow, Raft can handle hundreds of operations per second—far more than what a typical Operator needs.
  • Consistency: Writes are atomic, meaning Node C never sees a 'partial' transaction. Reads can be configured as Strong (guaranteed latest data from Leader) or Stale (fast local reads), giving you flexibility between correctness and performance.

Resource Overhead

Operators must be lightweight. Here is the estimated overhead of embedding a Raft/SQLite node:

  • CPU: Negligible when idle. During consensus and log replication (writes), expect spikes to 100-200m (millicores) as nodes handle serialization, cryptographic signing of entries, and active network exchange.
  • Memory:
    • Baseline: ~64MiB (Estimated based on standard Go runtime + Raft log cache + SQLite page cache).
    • Under Load: 256MiB - 512MiB (depending on caching strategy and query complexity).
  • Storage: Minimal. The Raft log is compacted into SQLite snapshots periodically.

Implementation: A Go-Based Stateful Operator

Below is a complete example using a hypothetical integration of hiqlite (a representative library for this pattern) to create a self-healing 3-node cluster.

Prerequisites

  • StatefulSet: You must deploy this as a StatefulSet so pods get stable names (operator-0, operator-1).
  • Headless Service: To allow pods to resolve each other's IPs by DNS.

The Code

package main

import (
    "context"
    "fmt"
    "log"
    "os"
    "time"

    // Replace with your chosen Raft/SQLite library
    "github.com/sebadob/hiqlite" 
)

// ClusterData represents the schema for our Valkey clusters
type ClusterData struct {
    ID        string
    Status    string
    NodeCount int
}

func main() {
    // 1. Identity & Discovery
    // In K8s StatefulSets, POD_NAME is stable (e.g., "my-operator-0")
    nodeID := os.Getenv("POD_NAME")
    if nodeID == "" {
        log.Fatal("POD_NAME env var is required")
    }

    // Define the peers. In a real operator, you might generate this string 
    // based on the Replicas count in your Helm chart.
    peers := []string{
        // Format: {pod_name}.{service_name}.{namespace}.svc.cluster.local:{port}
        "my-operator-0.operator-svc.default.svc.cluster.local:8080",
        "my-operator-1.operator-svc.default.svc.cluster.local:8080",
        "my-operator-2.operator-svc.default.svc.cluster.local:8080",
    }

    // 2. Initialize the Distributed DB
    // This starts the Raft listener and SQLite engine
    db, err := hiqlite.New(hiqlite.Config{
        NodeId:   nodeID,
        Address:  fmt.Sprintf("%s:8080", nodeID), // Listen on this pod's network
        DataDir:  "/var/lib/operator/data",      // Must be a PersistentVolume
        Members:  peers,
        Secret:   "cluster-shared-secret",       // basic security
    })
    if err != nil {
        log.Fatalf("Failed to initialize distributed store: %v", err)
    }

    // 3. Schema Migration (Idempotent)
    // Usually only the Raft leader executes this, but the library handles forwarding.
    initSchema(db)

    // 4. Start the Operator Loop
    go runOperatorLoop(db, nodeID)

    // Keep main process alive
    select {}
}

func initSchema(db *hiqlite.Client) {
    ctx := context.Background()
    query := `
    CREATE TABLE IF NOT EXISTS valkey_clusters (
        id TEXT PRIMARY KEY,
        status TEXT,
        node_count INTEGER,
        updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
    );`

    if err := db.Execute(ctx, query); err != nil {
        log.Printf("Schema init warning: %v", err)
    }
}

func runOperatorLoop(db *hiqlite.Client, nodeID string) {
    // Simulate reconciliation loop
    for {
        time.Sleep(10 * time.Second)

        // WRITE OPERATION
        // We insert/update state. If this node is a Follower, 
        // the library forwards the write to the Leader transparently.
        err := db.Execute(context.Background(), 
            "INSERT OR REPLACE INTO valkey_clusters (id, status, node_count) VALUES (?, ?, ?)",
            "cluster-primary", "Healthy", 5)

        if err != nil {
            log.Printf("[%s] Failed to sync state: %v", nodeID, err)
        } else {
            log.Printf("[%s] State synced successfully via Raft", nodeID)
        }

        // READ OPERATION
        // Reads can be strongly consistent (via Leader) or stale (local)
        // depending on configuration.
        var status string
        var count int
        row := db.QueryRow(context.Background(), 
            "SELECT status, node_count FROM valkey_clusters WHERE id = ?", 
            "cluster-primary")

        if err := row.Scan(&status, &count); err == nil {
            fmt.Printf("[%s] Current World State: Status=%s, Nodes=%d\n", nodeID, status, count)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Key Takeaways for Production

  • Persistence is Mandatory: Even though Raft replicates data, you must use PersistentVolumes (PVCs) for the underlying storage directory (/var/lib/operator/data). If the entire cluster restarts, in-memory data is lost. The PVC ensures the Raft log survives.
  • Handling Failures: If one node goes down, the other two continue to operate (Quorum = 2). When the failed node comes back, it will automatically "catch up" by downloading the missing logs or a full snapshot from the leader.
  • Readiness Probes: Don't mark your operator pod as "Ready" until the DB has joined the Raft cluster. This prevents K8s from routing traffic to a node that isn't synced yet. When a new pod joins (e.g., during a scale-up or replacement), it will start in a "Catch-up" state, replaying the Raft log from the leader until its local SQLite state matches the cluster consensus.

During this catch-up phase:

  • Writes: Any write request initiated by the new node will immediately work because the library transparently forwards the command to the current cluster Leader.
  • Reads: Stale local reads are available immediately but will return outdated data. Strongly consistent reads will only work once the node has joined the Raft group and synchronized its state, as they require a round-trip to the Leader to verify the latest index.

Only once this synchronization is complete should the readiness probe pass or build operator logic to wait for synchronization depending on your operator business logic, ensuring the operator never reconciles against potentially stale data in its local view.

Why Not Use a Standard Deployment?

While it is technically possible to run this architecture using a standard Kubernetes Deployment, it introduces significant operational complexity. If you choose to avoid StatefulSets, you must manually manage the following:

  • Quorum Management & Membership Changes: Raft requires a majority (Quorum) to perform any action, including removing a dead node. In a Deployment, if a pod dies and a new one starts with a random name, the cluster size effectively increases. If you don't explicitly remove the "old" node identity, you risk losing quorum during subsequent failures because the leader will keep trying to contact a node that no longer exists.
  • Identity-to-Storage Mapping: Standard Deployments do not guarantee which pod gets which Persistent Volume. You would need to write custom logic to ensure a new pod can find and mount the specific volume containing its previous Raft log and SQLite state.
  • Dynamic Peer Discovery: Without the stable DNS names provided by a Headless Service and StatefulSet (e.g., operator-0.svc), your nodes must constantly query the Kubernetes API to discover the current IPs of their peers and update the Raft membership list dynamically, which is prone to race conditions during split-brain scenarios.

StatefulSets simplify this by providing stable hostnames and predictable volume bindings, allowing your operator to focus on business logic rather than cluster coordination plumbing.

Conclusion

Moving your Operator's state from ConfigMaps to a distributed SQLite instance allows you to scale beyond the 1MiB limit while maintaining the simplicity of a single Go binary. By leveraging libraries like Hiqlite or Dqlite, you gain SQL capabilities, strong consistency, and high availability, making your Operator robust enough for critical production workloads.


Build More Resilient Systems with Aonnis

If you're managing complex caching layers and want to avoid the pitfalls of manual scaling and configuration, check out the Aonnis Valkey Operator. It helps you deploy and manage high-performance Valkey compatible clusters on Kubernetes with built-in best practices for reliability and scale. It is free for limited time.

Visit www.aonnis.com to learn more. If a feature is not available which you need then let us know on support@aonnis.com we will try to ship it within two weeks depending on the complexity of the feature.

Top comments (0)