Ozor

Posted on Mar 5

45 Node.js Microservices, 1 Server: Why I Chose PM2 Over Kubernetes

#devops #architecture #node #webdev

Everyone told me to use Kubernetes. Docker Compose at minimum. "You can't run 45 services on bare metal," they said.

I did it anyway. Here's how — and why it actually works.

The Problem

I needed to run a developer API platform — think a unified gateway giving developers access to 40+ APIs (geolocation, crypto prices, screenshots, DNS, web scraping, code execution, etc.) through a single API key.

Each API is a separate microservice. Some are Fastify (ESM), some are Express (CJS). Each binds to its own port. A central gateway proxies authenticated requests to the right service.

The conventional wisdom says: containerize everything, orchestrate with K8s, use service mesh for discovery. But I'm a solo developer on a single VPS. That's like bringing a tractor to mow a lawn.

The Architecture

                    Internet
                       │
                   ┌───┴───┐
                   │ nginx │  (reverse proxy, SSL, rate limiting)
                   └───┬───┘
                       │
              ┌────────┼────────┐
              │        │        │
         port 3000  port 3001  port 3002 ... port 3045
         ┌────┴──┐ ┌──┴───┐ ┌──┴───┐
         │gateway│ │geo   │ │crypto│  ... 42 more services
         └───────┘ └──────┘ └──────┘
                       │
                  ┌────┴────┐
                  │ SQLite  │  (shared analytics DB)
                  └─────────┘

Key components:

Nginx: Routes external traffic, handles SSL termination, enforces rate limits
PM2: Process manager — starts, monitors, restarts all 45 services
Gateway: Central auth layer — validates API keys, proxies to internal services
SQLite (WAL mode): Shared request logging across all services

Why PM2 Instead of Containers?

Three reasons:

1. Memory

Each Node.js process uses ~80-100 MB of RAM. With 45 services, that's about 4.5 GB total. On a 20 GB VPS, that leaves plenty of headroom.

Docker adds ~50-100 MB overhead per container for the runtime layer. With 45 containers, that's an extra 2-4 GB eaten by infrastructure, not your code.

# Actual memory usage across 45 services
$ pm2 monit
┌────────────────────┬──────────┐
│ Service            │ Memory   │
├────────────────────┼──────────┤
│ agent-gateway      │ 141 MB   │
│ agent-geo          │ 237 MB   │  ← GeoIP database loaded in memory
│ agent-screenshot   │ 135 MB   │  ← Puppeteer instance
│ agent-dns          │  92 MB   │
│ fair-games         │  88 MB   │
│ ...                │ ~80-100  │
│ Average            │ ~103 MB  │
└────────────────────┴──────────┘

2. Startup Speed

PM2 starts all 45 services in about 10 seconds. Docker Compose with 45 containers takes 30-60 seconds, sometimes longer if images need pulling.

When a service crashes, PM2 restarts it in under a second. Docker restart policies add several seconds of overhead.

3. Simplicity

My entire deployment is:

git pull && pm2 restart service-name

No Dockerfiles, no docker-compose.yml with 500 lines, no image registry, no overlay networks. Just Node.js processes managed by PM2.

The Nginx Layer

Each service gets its own nginx config block. The pattern is simple:

# /etc/nginx/sites-available/agent-geo
server {
    listen 443 ssl;
    server_name geo.yourdomain.com;

    location / {
        proxy_pass http://127.0.0.1:3034;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Host $host;
    }
}

Internal services only bind to 127.0.0.1 — they're not directly accessible from the internet. Only the gateway is exposed externally. All API requests go through the gateway, which handles authentication and proxies to the right service.

User → nginx → gateway (port 3000) → validates API key → proxy to internal service

Shared Infrastructure: The Request Logger

Every service needs logging. Instead of each service implementing its own, I built a shared Fastify plugin:

// shared/request-logger.js
import Database from 'better-sqlite3';

const db = new Database('/path/to/analytics.db');

// WAL mode for concurrent writes from 45 services
db.pragma('journal_mode = WAL');

const insert = db.prepare(`
  INSERT INTO requests (service, method, path, status, response_time_ms, ip, user_agent)
  VALUES (?, ?, ?, ?, ?, ?, ?)
`);

export default function requestLogger(fastify, opts) {
  fastify.addHook('onResponse', (req, reply) => {
    insert.run(
      opts.serviceName,
      req.method,
      req.url,
      reply.statusCode,
      Math.round(reply.elapsedTime),
      req.headers['x-real-ip'] || req.ip,
      req.headers['user-agent']
    );
  });
}

Each service registers it with one line:

await app.register(requestLogger, { serviceName: 'agent-geo' });

45 services → 1 analytics database → 1 dashboard to query. No Prometheus, no Grafana, no ELK stack. Just SQLite and a few SQL queries:

-- What's getting hit?
SELECT service, COUNT(*) as hits
FROM requests
WHERE timestamp > datetime('now', '-24 hours')
GROUP BY service
ORDER BY hits DESC;

-- Response time p95 per service
SELECT service,
  CAST(response_time_ms AS REAL) as p95_ms
FROM (
  SELECT service, response_time_ms,
    NTILE(20) OVER (PARTITION BY service ORDER BY response_time_ms) as tile
  FROM requests
  WHERE timestamp > datetime('now', '-1 hour')
)
WHERE tile = 19
GROUP BY service;

Health Monitoring

A simple health checker runs every 5 minutes via cron:

// health-monitor/check.js
const services = [
  { name: 'gateway', port: 3000, path: '/health' },
  { name: 'geo', port: 3034, path: '/health' },
  { name: 'crypto', port: 3008, path: '/health' },
  // ... 42 more
];

for (const svc of services) {
  try {
    const res = await fetch(`http://127.0.0.1:${svc.port}${svc.path}`, {
      signal: AbortSignal.timeout(5000)
    });
    if (!res.ok) throw new Error(`HTTP ${res.status}`);
  } catch (err) {
    unhealthy.push(`${svc.name}: ${err.message}`);
  }
}

if (unhealthy.length > 0) {
  // Send ONE alert with all failures (not one per service)
  sendAlert(`Health check: ${unhealthy.length} services down\n${unhealthy.join('\n')}`);
}

Key insight: batch your alerts. When all 45 services restart simultaneously, you don't want 45 individual notifications. One message listing all failures is enough.

The Gotchas

1. SQLite WAL Lock Contention

When all 45 services restart at once (like after pm2 restart all), they all try to open the SQLite database simultaneously. About 20 of them fail with lock errors and crash.

Fix: Restart in batches of 5-10 services at a time.

# Don't do this:
pm2 restart all

# Do this instead:
pm2 restart agent-gateway agent-geo agent-dns agent-crypto agent-scraper
sleep 2
pm2 restart agent-screenshot agent-email agent-wallet agent-paste agent-search
# ... and so on

2. Port Management

With 45 services, keeping track of ports gets messy. I allocate them sequentially:

3000 - gateway
3001 - service A
3002 - service B
...
3045 - service Z

Document your port map. You WILL forget which service is on which port otherwise.

3. Shared Dependencies

The node_modules folder for 45 services adds up. Each one has its own node_modules because they have slightly different dependency trees. This eats about 8 GB of disk space.

Potential fix: Use a monorepo with shared node_modules at the root. I haven't done this yet because migration is painful, but it's on the list.

4. Mixed Module Systems

Some services are ESM (Fastify), some are CJS (Express). The shared request logger needs to work with both. Solution: two versions of the plugin.

shared/request-logger.js       # ESM (for Fastify services)
shared/request-logger-express.cjs  # CJS (for Express services)

The .cjs extension is critical when your package.json has "type": "module".

When This Architecture Works

This setup makes sense when:

You're a solo developer or small team
Your services are all the same runtime (Node.js)
Traffic is moderate (thousands of requests/day, not millions)
You want simplicity over enterprise scalability
You can tolerate brief downtime during deploys

When It Doesn't

Don't do this if:

You need horizontal scaling (multiple servers)
Services use different runtimes (Go, Python, Rust mixed in)
You need zero-downtime deployments
You're running in a team where multiple people deploy simultaneously
You actually have the traffic to justify K8s

The Result

45 services running stably with 11+ hours of uptime, ~4.6 GB total memory footprint, and the whole thing managed with pm2 commands and nginx configs. No container registry, no Helm charts, no YAML files longer than a CVS receipt.

Is it "production-grade" by enterprise standards? No. Does it work perfectly for a developer API platform serving real traffic? Absolutely.

Try the APIs

If you want to see what 45 microservices behind one gateway looks like from the outside:

# Get a free API key (200 credits, no signup)
curl -X POST https://agent-gateway-kappa.vercel.app/api/keys/create

# Try IP geolocation
curl "https://agent-gateway-kappa.vercel.app/v1/agent-geo/geo/8.8.8.8" \
  -H "Authorization: Bearer YOUR_KEY"

# Get crypto prices
curl "https://agent-gateway-kappa.vercel.app/v1/crypto-feeds/api/prices" \
  -H "Authorization: Bearer YOUR_KEY"

40+ services, one key, one curl command. That's the point of running all of this on one box — unified access.

Full API catalog: https://api-catalog-three.vercel.app

What's your take — would you go bare-metal PM2, or is K8s always the answer? Drop a comment below.

DEV Community