אחיה כהן

Posted on Mar 23 • Edited on Mar 29 • Originally published at achiya-automation.com

How I Built a Multi-Tenant WhatsApp Automation Platform Using n8n and WAHA

#n8n #whatsapp #automation #discuss

TL;DR: I run WhatsApp automation workflows for 50+ businesses on shared infrastructure using n8n (queue mode), WAHA (unofficial WhatsApp Web API), Supabase, and Chatwoot. This is the technical deep-dive into how the multi-tenant architecture works, the problems I solved, and what it costs.

The Problem

I'm a solo automation engineer based in Israel. My clients are mostly small-to-medium businesses that need WhatsApp automation — appointment reminders, lead qualification, order confirmations, customer support bots. Each client has different workflows, different WhatsApp numbers, and different business logic.

The naive approach is spinning up a separate n8n instance per client. That works for 3 clients. At 50+, you're managing 50 Docker stacks, 50 PostgreSQL databases, 50 sets of credentials. Updates become a nightmare. Monitoring becomes impossible.

I needed a single n8n instance that could:

Handle webhooks from 50+ WhatsApp sessions simultaneously
Route messages to the correct workflow per client
Not let one client's heavy traffic block another's
Survive session disconnects and reconnections gracefully
Cost less than $15/month per client in infrastructure

Here's how I built it.

Architecture Overview

                    ┌─────────────────┐
                    │   WhatsApp Web   │
                    │  (50+ sessions)  │
                    └────────┬────────┘
                             │
                    ┌────────▼────────┐
                    │      WAHA       │
                    │  (GOWS engine)  │
                    │  Docker + Redis │
                    └────────┬────────┘
                             │ webhooks
                    ┌────────▼────────┐
                    │     Caddy       │
                    │ (reverse proxy) │
                    └────────┬────────┘
                             │
              ┌──────────────▼──────────────┐
              │           n8n               │
              │  ┌─────────┐  ┌──────────┐  │
              │  │  Main   │  │  Worker   │  │
              │  │ Process │  │ (concur.  │  │
              │  │ (UI +   │  │   = 10)   │  │
              │  │ webhook)│  │           │  │
              │  └────┬────┘  └─────┬─────┘  │
              │       │             │         │
              │  ┌────▼─────────────▼────┐   │
              │  │        Redis          │   │
              │  │   (job queue + pub)   │   │
              │  └───────────────────────┘   │
              └──────────────┬───────────────┘
                             │
              ┌──────────────▼──────────────┐
              │        PostgreSQL 16        │
              │   (n8n workflows + data)    │
              └─────────────────────────────┘
                             │
              ┌──────────────▼──────────────┐
              │         Supabase            │
              │  (client data, CRM, logs)   │
              └─────────────────────────────┘
                             │
              ┌──────────────▼──────────────┐
              │         Chatwoot            │
              │  (customer support inbox)   │
              └─────────────────────────────┘

Five containers run the core n8n stack:

n8n (main process) — handles the UI and receives webhooks
n8n-worker — executes workflows, concurrency set to 10
n8n-postgres — PostgreSQL 16 for workflow storage
n8n-redis — Redis 7 for the job queue
caddy — reverse proxy with automatic HTTPS

WAHA runs separately with its own Redis instance, managing all WhatsApp Web sessions.

n8n Queue Mode: Why It Matters

By default, n8n runs in "regular" mode — the same process that serves the UI also executes workflows. If a webhook comes in while a heavy workflow is running, the webhook handler blocks.

Queue mode splits this into two processes:

# docker-compose.yml (simplified)
services:
  n8n:
    image: n8nio/n8n:latest
    environment:
      - EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=n8n-redis
      - QUEUE_BULL_REDIS_PORT=6379
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=n8n-postgres
      - DB_POSTGRESDB_DATABASE=n8n
      - DB_POSTGRESDB_USER=n8n
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}
      - EXECUTIONS_DATA_MAX_AGE=168
      - EXECUTIONS_DATA_SAVE_ON_SUCCESS=all
      - EXECUTIONS_DATA_PRUNE_MAX_COUNT=50000
    networks:
      - n8n-net

  n8n-worker:
    image: n8nio/n8n:latest
    command: worker
    environment:
      - EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=n8n-redis
      - QUEUE_BULL_REDIS_PORT=6379
      - QUEUE_HEALTH_CHECK_ACTIVE=true
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=n8n-postgres
      - DB_POSTGRESDB_DATABASE=n8n
      - DB_POSTGRESDB_USER=n8n
      - DB_POSTGRESDB_PASSWORD=${POSTGRES_PASSWORD}
    networks:
      - n8n-net

  n8n-postgres:
    image: postgres:16-alpine
    environment:
      - POSTGRES_DB=n8n
      - POSTGRES_USER=n8n
      - POSTGRES_PASSWORD=${POSTGRES_PASSWORD}
    volumes:
      - n8n-db-data:/var/lib/postgresql/data
    networks:
      - n8n-net

  n8n-redis:
    image: redis:7-alpine
    command: redis-server --maxmemory 256mb --maxmemory-policy allkeys-lru
    networks:
      - n8n-net

  caddy:
    image: caddy:2-alpine
    ports:
      - "80:80"
      - "443:443"
    volumes:
      - ./Caddyfile:/etc/caddy/Caddyfile
    networks:
      - n8n-net

networks:
  n8n-net:

The main process receives all webhooks and pushes jobs to the Redis queue. The worker picks them up with a concurrency of 10, meaning up to 10 workflows execute simultaneously. If more come in, they queue — no dropped messages.

Execution Retention

I keep 7 days of full execution history (EXECUTIONS_DATA_MAX_AGE=168 hours) with a hard cap at 50,000 executions. This is critical for debugging client issues. When a client says "the bot didn't respond yesterday at 3 PM," I can pull up the exact execution, see the input payload, and trace where it failed.

WAHA: The WhatsApp Gateway

WAHA (WhatsApp HTTP API) is an unofficial, open-source WhatsApp Web API. It wraps the WhatsApp Web protocol in a REST API with webhook support. I use the GOWS engine (Go-based, not the Node.js WEBJS engine) because it's significantly more stable for long-running sessions.

Multi-Session Setup

WAHA supports multiple WhatsApp sessions in a single container. Each client gets their own session, identified by a session name:

# Start a new session for a client
curl -X POST http://localhost:3000/api/sessions/start \
  -H 'Content-Type: application/json' \
  -H 'X-API-KEY: ${WAHA_API_KEY}' \
  -d '{
    "name": "client_acme_corp",
    "config": {
      "webhooks": [{
        "url": "https://n8n.example.com/webhook/waha-gateway",
        "events": [
          "message",
          "message.ack",
          "session.status"
        ]
      }]
    }
  }'

Key detail: all sessions send webhooks to the same n8n endpoint. The routing happens inside n8n, not at the WAHA level. This is deliberate — it means I can add a new client without touching the webhook infrastructure.

Session Persistence

WAHA stores session data on disk. The Docker volume mapping is critical:

# WAHA docker-compose.yaml
services:
  waha:
    image: devlikeapro/waha:latest
    environment:
      - WHATSAPP_DEFAULT_ENGINE=GOWS
      - WAHA_DASHBOARD_ENABLED=true
      - WHATSAPP_RESTART_ALL_SESSIONS=true
    volumes:
      - /opt/waha/sessions:/app/.sessions
      - /opt/waha/media:/app/.media
    ports:
      - "3000:3000"

WHATSAPP_RESTART_ALL_SESSIONS=true means that when the container restarts (deploy, crash, server reboot), all sessions automatically reconnect. Without this, you'd need to manually re-scan QR codes for 50+ phones.

The /opt/waha/sessions directory contains the session auth data. Back this up. If you lose it, every client needs to re-scan their QR code.

Webhook Routing: The Heart of Multi-Tenancy

Every incoming WhatsApp message hits the same webhook endpoint. The n8n workflow needs to figure out which client it belongs to and route it to the correct handler.

The Gateway Workflow

// n8n Function node: "Route by Session"
const sessionName = $input.first().json.session;
const event = $input.first().json.event;
const payload = $input.first().json.payload;

// Extract client identifier from session name
// Convention: "client_{slug}" -> slug is the routing key
const clientSlug = sessionName.replace('client_', '');

// Look up client config from Supabase
const clientConfig = await $getWorkflowStaticData('global');

if (!clientConfig[clientSlug]) {
  // First time seeing this client in this execution context
  // Fetch from Supabase
  const { data } = await this.helpers.httpRequest({
    method: 'GET',
    url: `${process.env.SUPABASE_URL}/rest/v1/clients`,
    qs: { slug: `eq.${clientSlug}`, select: '*' },
    headers: {
      'apikey': process.env.SUPABASE_SERVICE_KEY,
      'Authorization': `Bearer ${process.env.SUPABASE_SERVICE_KEY}`
    }
  });

  if (data && data.length > 0) {
    clientConfig[clientSlug] = data[0];
  }
}

const client = clientConfig[clientSlug];

if (!client) {
  console.log(`Unknown session: ${sessionName}`);
  return []; // Drop unknown sessions silently
}

return [{
  json: {
    client_id: client.id,
    client_slug: clientSlug,
    workflow_id: client.active_workflow_id,
    phone: payload.from,
    message: payload.body,
    timestamp: payload.timestamp,
    session: sessionName,
    raw: payload
  }
}];

The routing key is the session name. When I onboard a new client, I:

Create a row in the Supabase clients table with their config
Start a WAHA session named client_{slug}
Build their specific workflow in n8n
Set active_workflow_id in their client config

The gateway workflow then calls the client's specific workflow using n8n's "Execute Workflow" node, passing the parsed message data.

Why Not Separate Webhook Endpoints?

I tried this first. Each client got their own webhook URL: /webhook/client-acme, /webhook/client-globex, etc. Problems:

WAHA webhook config per session is fragile. If you change the webhook URL, you need to restart the session.
50+ webhook endpoints in n8n are hard to manage. Each one is a separate workflow trigger, and you can't easily see which are active.
Monitoring is harder. With a single gateway, I can log every incoming message in one place.

The single-gateway pattern means all messages flow through one chokepoint, which sounds scary but is actually easier to monitor, rate-limit, and debug.

Rate Limiting and Queue Management

WhatsApp is aggressive about rate limiting and banning numbers that send too many messages too fast. The exact limits aren't documented (it's an unofficial API), but from experience:

New numbers: ~200 messages/day before risk increases
Warmed-up numbers: ~1,000 messages/day is generally safe
Burst limit: No more than 30 messages per minute per number

I implement rate limiting in n8n using a combination of Redis and workflow logic:

// n8n Function node: "Rate Limiter"
const redis = require('ioredis');
const client = new redis(process.env.QUEUE_BULL_REDIS_HOST);

const sessionName = $input.first().json.session;
const minuteKey = `ratelimit:${sessionName}:${Math.floor(Date.now() / 60000)}`;
const dayKey = `ratelimit:${sessionName}:${new Date().toISOString().split('T')[0]}`;

// Check minute limit
const minuteCount = await client.incr(minuteKey);
if (minuteCount === 1) await client.expire(minuteKey, 120);

// Check daily limit
const dayCount = await client.incr(dayKey);
if (dayCount === 1) await client.expire(dayKey, 86400);

const maxPerMinute = $input.first().json.client_config?.max_per_minute || 25;
const maxPerDay = $input.first().json.client_config?.max_per_day || 800;

if (minuteCount > maxPerMinute) {
  // Queue for delayed sending
  return [{
    json: {
      ...$input.first().json,
      delayed: true,
      delay_seconds: 60 - (Date.now() / 1000 % 60)
    }
  }];
}

if (dayCount > maxPerDay) {
  // Log and alert, don't send
  return [{
    json: {
      ...$input.first().json,
      blocked: true,
      reason: 'daily_limit_exceeded'
    }
  }];
}

return $input.all();

The per-client limits are stored in Supabase and cached in the workflow's static data. New clients start with conservative limits; I increase them gradually as their number warms up.

Queue Priority

n8n's queue mode uses Bull (Redis-based job queue). All webhook-triggered workflows get the same priority by default. I haven't needed to implement priority queues because the worker concurrency of 10 handles the load — but if I needed to, Bull supports job priority natively.

The more important optimization is keeping workflows fast. A workflow that takes 500ms instead of 5s means the queue drains 10x faster. I aggressively use n8n's "Execute Workflow" node to break complex logic into small, focused sub-workflows.

Template Message Management

Most clients don't write their own message templates. They describe what they want ("send a reminder 24 hours before the appointment with the client's name and time"), and I build it.

Templates are stored in Supabase:

CREATE TABLE message_templates (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  client_id UUID REFERENCES clients(id),
  slug TEXT NOT NULL,
  body TEXT NOT NULL,
  variables JSONB DEFAULT '[]',
  language TEXT DEFAULT 'he',
  created_at TIMESTAMPTZ DEFAULT now(),
  updated_at TIMESTAMPTZ DEFAULT now(),
  UNIQUE(client_id, slug)
);

-- Example insert
INSERT INTO message_templates (client_id, slug, body, variables)
VALUES (
  'abc-123',
  'appointment_reminder',
  'שלום {{name}}, תזכורת לתור שלך ב-{{date}} בשעה {{time}}. לאישור השב 1, לביטול השב 2.',
  '["name", "date", "time"]'
);

The n8n workflow fetches the template, interpolates variables, and sends via WAHA:

// n8n Function node: "Render Template"
const template = $input.first().json.template;
let body = template.body;

const variables = $input.first().json.variables || {};
for (const [key, value] of Object.entries(variables)) {
  body = body.replace(new RegExp(`\\{\\{${key}\\}\\}`, 'g'), value);
}

return [{
  json: {
    session: $input.first().json.session,
    to: $input.first().json.phone,
    body: body
  }
}];

Then an HTTP Request node sends it:

POST ${WAHA_URL}/api/sendText
Headers:
  X-API-KEY: ${WAHA_API_KEY}
Body:
{
  "session": "{{$json.session}}",
  "chatId": "{{$json.to}}@c.us",
  "text": "{{$json.body}}"
}

Monitoring and Health Checks

With 50+ bots running, things break silently. A WhatsApp session disconnects. A workflow errors out. Redis fills up. You need to know before the client calls you.

n8n Health Monitoring

I run a monitoring script via cron every 5 minutes:

#!/bin/bash
# /opt/n8n-docker-caddy/scripts/monitor-n8n.sh

N8N_URL="http://localhost:5678"
WEBHOOK_URL="http://localhost:5678/webhook/monitoring-alert"

# Check n8n main process
if ! curl -sf "${N8N_URL}/healthz" > /dev/null 2>&1; then
  curl -s -X POST "${WEBHOOK_URL}" \
    -H 'Content-Type: application/json' \
    -d '{"alert": "n8n_main_down", "severity": "critical"}'
fi

# Check worker
WORKER_STATUS=$(docker inspect --format='{{.State.Health.Status}}' n8n-worker 2>/dev/null)
if [ "$WORKER_STATUS" != "healthy" ]; then
  curl -s -X POST "${WEBHOOK_URL}" \
    -H 'Content-Type: application/json' \
    -d '{"alert": "n8n_worker_unhealthy", "status": "'$WORKER_STATUS'"}'
fi

# Check Redis memory
REDIS_MEMORY=$(docker exec n8n-redis redis-cli info memory | grep used_memory_human | cut -d: -f2 | tr -d '[:space:]')
# Alert if over 200MB
REDIS_MB=$(echo $REDIS_MEMORY | sed 's/M//')
if (( $(echo "$REDIS_MB > 200" | bc -l) )); then
  curl -s -X POST "${WEBHOOK_URL}" \
    -H 'Content-Type: application/json' \
    -d '{"alert": "redis_memory_high", "usage": "'$REDIS_MEMORY'"}'
fi

# Check PostgreSQL
if ! docker exec n8n-postgres pg_isready -U n8n > /dev/null 2>&1; then
  curl -s -X POST "${WEBHOOK_URL}" \
    -H 'Content-Type: application/json' \
    -d '{"alert": "postgres_down", "severity": "critical"}'
fi

The monitoring webhook triggers an n8n workflow that sends me a WhatsApp alert. Yes, I use my own platform to monitor my own platform. If the whole stack is down, the alert won't fire — but Hetzner's monitoring catches full server outages.

WAHA Session Monitoring

// n8n Cron workflow: runs every 10 minutes
// HTTP Request to WAHA API
// GET ${WAHA_URL}/api/sessions

const sessions = $input.first().json;
const disconnected = sessions.filter(s =>
  s.status !== 'WORKING' && s.name.startsWith('client_')
);

if (disconnected.length > 0) {
  // Try to restart disconnected sessions
  for (const session of disconnected) {
    await this.helpers.httpRequest({
      method: 'POST',
      url: `${process.env.WAHA_URL}/api/sessions/${session.name}/restart`,
      headers: { 'X-API-KEY': process.env.WAHA_API_KEY }
    });
  }

  // Alert me
  return [{
    json: {
      alert: 'sessions_disconnected',
      sessions: disconnected.map(s => s.name),
      action: 'auto_restart_attempted'
    }
  }];
}

return []; // All good, no output

Database Backup Strategy

I learned this the hard way. n8n originally used SQLite, and I experienced data corruption that lost workflow execution history. I migrated to PostgreSQL 16 and implemented automated daily backups:

#!/bin/bash
# /opt/n8n-docker-caddy/scripts/pg-backup.sh
# Runs daily at 3:00 AM via cron

BACKUP_DIR="/opt/n8n-backups"
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="${BACKUP_DIR}/n8n_${TIMESTAMP}.dump"

# Create backup using custom format (supports parallel restore)
docker exec n8n-postgres pg_dump \
  -U n8n \
  -d n8n \
  -Fc \
  -f /tmp/backup.dump

# Copy from container
docker cp n8n-postgres:/tmp/backup.dump "${BACKUP_FILE}"

# Clean up old backups (keep 7 days)
find "${BACKUP_DIR}" -name "*.dump" -mtime +7 -delete

# Verify backup is not empty
if [ ! -s "${BACKUP_FILE}" ]; then
  echo "BACKUP FAILED: Empty file" >&2
  exit 1
fi

echo "Backup completed: ${BACKUP_FILE} ($(du -h ${BACKUP_FILE} | cut -f1))"

Restore when needed:

docker exec -i n8n-postgres pg_restore \
  -U n8n -d n8n \
  --clean --if-exists \
  < /opt/n8n-backups/n8n_20260320_030000.dump

Scaling Lessons Learned

1. Worker Concurrency Is Not "More Is Better"

I started with QUEUE_BULL_DEFAULT_JOB_OPTIONS_ATTEMPTS=3 and concurrency at 20. Workflows started failing with database connection pool exhaustion. PostgreSQL's default max_connections = 100, and each concurrent workflow execution holds a connection.

Concurrency of 10 with the default connection pool is the sweet spot. If you need more throughput, add another worker container — don't increase concurrency on a single worker.

2. Static Data Is Your Friend (and Enemy)

n8n's $getWorkflowStaticData('global') persists data across executions in memory. I use it for caching client configs so I don't hit Supabase on every message. But it's per-process — the main process and the worker have different static data. And it's lost on restart.

Pattern: use static data as a cache, always fall back to the database.

3. WAHA Session Limits

A single WAHA container comfortably handles 20-30 active sessions on a 4GB RAM server. Beyond that, I've seen memory pressure cause session disconnects. For 50+ sessions, either allocate more RAM or run multiple WAHA containers with session affinity.

4. PostgreSQL Tuning

Enable pg_stat_statements for query performance tracking. n8n generates some heavy queries for execution history. The default shared_buffers and work_mem settings are fine for up to ~30,000 stored executions, but with the 50,000 cap I use, bumping shared_buffers to 256MB made a noticeable difference.

5. Webhook Timeouts

WAHA has a default webhook timeout. If your n8n webhook takes too long to respond (because the queue is full or the workflow is complex), WAHA will retry. This causes duplicate message processing.

Solution: make the gateway workflow as fast as possible. It should only parse, route, and enqueue — never do the actual work inline. The heavy lifting happens asynchronously via "Execute Workflow."

Cost Breakdown

Here's what it costs to run 50+ WhatsApp bots on shared infrastructure, hosted on Hetzner Cloud (Germany/Finland data centers):

Component	Server Spec	Monthly Cost
n8n (queue mode)	CPX31 (4 vCPU, 8GB RAM)	~$15
WAHA	CPX21 (3 vCPU, 4GB RAM)	~$10
Supabase (self-hosted)	CPX21 (3 vCPU, 4GB RAM)	~$10
Chatwoot	CPX21 (3 vCPU, 4GB RAM)	~$10
Website + misc	CX22 (2 vCPU, 4GB RAM)	~$6
Hetzner daily backups	All servers	~$5
Domain + DNS	Cloudflare	Free
Total		~$56/month

That's roughly $1.10 per bot per month in infrastructure costs.

For comparison, a managed WhatsApp Business API provider charges $50-200/month per number. Official BSP (Business Solution Provider) plans start at $100/month per number for basic messaging.

The tradeoff: I'm using an unofficial API (WAHA wraps WhatsApp Web, not the official Business API). This means:

No green checkmark verification
Risk of number bans if you abuse limits
No official SLA
But: no per-message fees, no approval process, and full control

For my clients — small businesses doing appointment reminders and customer support — this tradeoff makes sense. For enterprise-scale marketing campaigns, use the official API.

What I'd Do Differently

Start with PostgreSQL, not SQLite. The migration was painful and I lost some data. n8n's SQLite mode is fine for personal use, not production.
Build the monitoring first. I added monitoring after the third time a session silently disconnected and I found out from an angry client.
Standardize session naming earlier. I started with freeform names ("johns-dental", "pizzaplace") and later switched to client_{slug}. Migrating was annoying.
Use Supabase Edge Functions for the rate limiter instead of doing it in n8n. The Redis-in-n8n approach works but is harder to test and maintain.

Stack Summary

Tool	Role	Why This One
n8n	Workflow automation engine	Visual builder, self-hosted, queue mode, active community
WAHA	WhatsApp Web API	Multi-session, REST API, Go engine for stability
Supabase	Database + API	PostgreSQL with instant REST API, row-level security
Chatwoot	Customer support	Open-source, WhatsApp inbox, agent assignment
Caddy	Reverse proxy	Automatic HTTPS, simple config
Hetzner Cloud	Hosting	European data centers, great price/performance

If you're building something similar or have questions about the architecture, find me at achiya-automation.com/en.

Achiya Cohen is an automation engineer specializing in WhatsApp automation and business process workflows. He runs Achiya Automation from Israel, serving businesses with n8n-based automation solutions.

Over to You

I want to hear from people running multi-tenant setups:

At what client count did your "one instance per client" approach become unmanageable? For me, it was around client #15 — Docker Compose files everywhere, each one slightly different, zero standardization.

And here's the thing I still haven't solved cleanly: database migrations across 50+ tenant schemas. I've tried Flyway, custom n8n workflows, even bash scripts. None feel right. What's working for you?

Drop your stack below — especially if you've found an elegant solution I missed. 👇

DEV Community