How I built a fault-tolerant Node.js backend — Docker, HashiCorp Consul, and GitHub Actions

#node #backend #docker #devops

I was working on iTicket.AZ — a backend service for real-time event ticketing, built with Node.js and TypeScript — when I came across a job posting at a major bank. Their requirement: "build scalable, resilient, and fault-tolerant applications."

I looked at my own backend and asked honestly: is this fault-tolerant? The answer was no. The server had no health awareness, no service discovery, no restart policy, and no automated build verification. This post is about exactly what I fixed — with real code from the project.

Problem 1 — The backend had no health awareness

When the database went down, the backend kept accepting HTTP requests and silently failing all of them. No signal to any external system.

app.get("/api/v1/health", async (_req, res) => {
  const dbOk = await AppDataSource.query("SELECT 1")
    .then(() => true)
    .catch(() => false);

  res.status(dbOk ? 200 : 503).json({
    status: dbOk ? "healthy" : "degraded",
    checks: {
      database: dbOk ? "up" : "down",
      uptime: Math.floor(process.uptime()),
    },
    service: "iticket-api",
    timestamp: new Date().toISOString(),
  });
});

A 200 means everything is working. A 503 means the service is alive but impaired. Any infrastructure tool — Consul, a load balancer, Kubernetes — can now make routing decisions based on this response.

Problem 2 — No service discovery

Even with a health endpoint, nothing was calling it. HashiCorp Consul polls /api/v1/health every 10 seconds. If it receives a 503, it marks the instance critical and deregisters it after 30 seconds — automatically.

async function registerWithConsul(): Promise<void> {
  try {
    await consulClient.agent.service.register({
      name: "iticket-api",
      address: "api",
      port: Number(appConfig.PORT),
      check: {
        name: "iticket-api health",
        http: `http://api:${appConfig.PORT}/api/v1/health`,
        interval: "10s",
        timeout: "3s",
        deregistercriticalserviceafter: "30s",
      },
    } as any);
  } catch (err) {
    // Non-fatal — app starts normally in local dev without Consul
    console.warn("Consul registration skipped:", err);
  }
}

The try/catch is intentional. Making the registration failure non-fatal is itself a fault-tolerance decision: the monitoring layer going down should not take the application with it.

Problem 3 — A single crash meant a permanent outage

services:
  api:
    build: .
    restart: unless-stopped
    depends_on:
      postgres:
        condition: service_healthy
    environment:
      DB_HOST: postgres

  postgres:
    image: postgres:16-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${DB_USERNAME} -d ${DB_NAME}"]
      interval: 10s
      retries: 5
    restart: unless-stopped

  consul:
    image: hashicorp/consul:1.18
    ports:
      - "8500:8500"
    restart: unless-stopped

PostgreSQL must pass its own health check before the API starts. If the API crashes, Docker restarts it. Consul monitors it continuously.

Problem 4 — Broken builds reached the repository undetected

name: iTicket.AZ CI
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: '20', cache: 'npm' }
      - run: npm ci
      - run: npm install typescript --save-dev
      - run: npx tsc --noEmit
      - run: npx tsc

  docker:
    needs: build
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - run: docker build -t iticket-api:${{ github.sha }} .

How the pieces connect

git push
  → GitHub Actions: type check + Docker build
      ↓ passes
  → docker-compose up
      PostgreSQL → health check passes
      ↓ healthy
      iticket-api → registers with Consul
      ↓
      Consul polls /api/v1/health every 10s
        503 → marks critical → deregisters after 30s
        crash → Docker restarts automatically

A health endpoint is useless without something polling it. Consul is useless without something to register with it. A restart policy is useless if the app starts before the database is ready. The pieces only become fault-tolerant as a system when they are connected.

GitHub: https://github.com/sahin4367/iTicket.AZ