DEV Community

Cover image for Your Database Failover Doesn't Work. Ask Me How I Know.
Cheela Sathvik
Cheela Sathvik

Posted on

Your Database Failover Doesn't Work. Ask Me How I Know.

Last year, our primary database went down during a critical traffic spike. We had failover configured. Or at least, we thought we did.

Our logic seemed solid enough on paper:

if (primary.isDown) {
  switchTo(replica);
}
Enter fullscreen mode Exit fullscreen mode

Simple, right?

Wrong.

The primary wasn't "down" in the way our code expected. It wasn't refusing connections. It was just incredibly, painfully slow. Our manual health check—a simple ping—kept succeeding, but queries were timing out.

Because our health check didn't account for latency spikes, our application kept flapping between the primary and the replica. Connections piled up. The pool exhausted. The service stalled.

It wasn't a crash. It was a slow, painful death by race condition.

The fix? We wrote about 200 lines of manual "glue code" to handle:

  • Timeouts
  • Retries
  • Circuit breaking
  • Graceful failover

I got tired of seeing this same brittle, manual boilerplate in every service I worked on. So I decided to solve it once and for all.

Introducing OmniDB

OmniDB is a thin orchestration library for Node.js. It’s not a new database client. It doesn't try to replace Prisma, or Mongoose, or pg. It simply orchestrates the clients you already use.

It handles the hard stuff automatically:

  • Real Health Checks: Checks that respect timeouts and latency, not just connectivity.
  • Race-Condition-Free Failover: Atomic switching when things go wrong.
  • Circuit Breakers: Built-in protection to stop cascading failures.
  • Graceful Shutdowns: Ensuring connections close cleanly.

The Problem with "Do It Yourself"

Most developers (myself included) start with something like this:

// ❌ The DIY Trap
const primary = new Client(PRIMARY_URL);
const replica = new Client(REPLICA_URL);

let active = primary;

setInterval(async () => {
    try {
        await primary.ping();
        active = primary;
    } catch (e) {
        console.warn("Primary down, using replica");
        active = replica;
    }
}, 5000);
Enter fullscreen mode Exit fullscreen mode

Why this breaks:

  1. Flapping: If the primary is unstable, you switch back and forth effectively DDOSing yourself.
  2. No Circuit Breaker: If the replica is also stressed, you'll hammer it until it breaks too.
  3. Request Drops: Requests in flight during the switch often just fail.

The OmniDB Way

Here is how you solve that same problem with OmniDB. It’s about 20 lines of code, and it handles all the edge cases we missed above.

// ✅ With OmniDB
import { Orchestrator } from 'omni-db';

const db = new Orchestrator({
  connections: {
      primary: new Client(PRIMARY_URL),
      replica: new Client(REPLICA_URL)
  },
  failover: { primary: 'replica' }, // simple declarative routing
  healthCheck: {
    interval: '30s',
    timeout: '5s', // strict timeout for checks
    checks: {
      primary: async (c) => { await c.query('SELECT 1'); return true; },
      replica: async (c) => { await c.query('SELECT 1'); return true; }
    }
  }
});

await db.connect();

// Usage:
// Automatically uses 'primary'.
// If 'primary' is unhealthy, it routes to 'replica'.
const client = db.get('primary');
Enter fullscreen mode Exit fullscreen mode

Resilience by Design

OmniDB doesn't just check if a server is there; it manages the lifecycle of your availability.

Circuit Breakers

If a database starts failing repeatedly, OmniDB opens the circuit. This stops your application from waiting on dead connections and allows the database time to recover. You can use the built-in simple breaker or bring your own robust solution like opossum or cockatiel.

Events

You shouldn't have to guess what your database is doing. OmniDB emits events for everything:

db.on('failover', ({ primary, backup }) => {
  alertOps(`🚨 Switched from ${primary} to ${backup}`);
});

db.on('recovery', ({ primary, backup }) => {
  console.log(`✅ Recovered ${primary}, switching back from ${backup}`);
});
Enter fullscreen mode Exit fullscreen mode

Why Use OmniDB?

  • Zero Dependencies: Just the core library. No bloat.
  • Scalable: O(1) lookup performance, making it efficient even with hundreds of database shards/tenants.
  • Agnostic: Works with Prisma, Drizzle, Redis, MongoDB, Postgres, or any other client.
  • TypeScript Native: Full type inference. db.get('redis') returns a Redis client type.

Stop Writing Boilerplate

Infrastructure code should be boring. It should be standard. And it should work.

Stop wrestling with race conditions in your setInterval loops. Start orchestrating.

Check out the repo on GitHub:
👉 https://github.com/sathvikc/omni-db


If you've ever battled a "silent failure" in production, let me know in the comments. I'd love to hear your war stories.

Top comments (0)