Timevolt

Posted on Jun 2

Horizontal vs Vertical Scaling: What Actually Works in Production

#devops #docker #kubernetes #cicd

Horizontal vs Vertical Scaling: What Actually Works in Production

Quick context (why you're writing this)

I was on call last month when our API started timing out under a modest traffic spike. The service was a simple Node/Express app talking to a Postgres database, deployed on a single t3.large EC2 instance. We’d been told “just bump the instance size” and that would buy us breathing room. I spun up a t3.xlatge, re‑deployed, and watched the CPU drop from 90 % to 45 %… only to see latency creep back up after another 15 minutes. Turns out we were hitting a database connection bottleneck, not CPU. That night I spent three hours digging through CloudWatch metrics, realizing we’d been solving the wrong problem. It forced me to rethink what “scaling” really means for our stack.

The Insight

Vertical scaling (bigger VM) helps when the bottleneck is a single resource you can throw more CPU, RAM, or disk at. Horizontal scaling (more instances) shines when the workload can be split—think stateless request handling, background workers, or read‑replicas for a DB. The catch is that you often need to redesign parts of your app to take advantage of the extra nodes: shared state, sticky sessions, or a single writer database become blockers. If you ignore those, you’ll just be paying for more machines that sit idle while the same bottleneck throttles you.

In practice, most web services hit a mix of both limits. You’ll vertical‑scale the database up to a point where licensing or hardware cost explodes, then you add read replicas and sharding. The app tier? Keep it stateless and throw more containers behind a load balancer. That split‑approach saved us from a costly over‑provisioning cycle and gave us a clear path for future growth.

How (with code)

Below is a stripped‑down version of our Node service that talks to Postgres. I’ll show two common mistakes people make when they try to “scale” by just adding more instances, then a small fix that makes the service truly horizontally scalable.

Mistake #1: Storing session data in memory

// server.js – naive in‑memory session store
const express = require('express');
const session = require('express-session');
const app = express();

app.use(session({
  secret: 'super‑secret',
  resave: false,
  saveUninitialized: true,
  store: new session.MemoryStore()   // <-- problem
}));

app.get('/login', (req, res) => {
  // after auth we put user info in req.session.user
  req.session.user = { id: userId, role: 'member' };
  res.redirect('/dashboard');
});

app.listen(3000);

What goes wrong: When you spin up a second instance behind a load balancer, a user whose request lands on instance A gets their session stored only in that instance’s memory. The next request might hit instance B, which has no idea who the user is, so they get logged out. You’ll see intermittent 401s that look like “random auth failures.”

Fix: Move the session store to an external, shared datastore like Redis.

// server.js – using Redis for sessions (horizontal‑safe)
const redis = require('redis');
const connectRedis = require('connect-redis');
const RedisStore = connectRedis(session);

const redisClient = redis.createClient({ 
  host: process.env.REDIS_HOST, 
  port: process.env.REDIS_PORT 
});

app.use(session({
  store: new RedisStore({ client: redisClient }),
  secret: process.env.SESSION_SECRET,
  resave: false,
  saveUninitialized: false
}));

Now every instance reads/writes session data to the same Redis cluster, so the auth state follows the user wherever they land.

Mistake #2: Assuming the DB can handle unlimited writes from many app nodes

// db.js – naive pool configuration
const { Pool } = require('pg');

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  // no max set – defaults to infinity (actually 100)
});

module.exports = { pool };

When we added three more app containers, each pool opened up to 100 connections, hammering the Postgres instance with 300+ simultaneous connections. The DB started rejecting new connections with “too many clients,” and our error rate spiked.

Fix: Use a connection pooler like PgBouncer or set a sensible max based on your DB’s capacity, then share that pool across instances via an external proxy.

// db.js – pool with explicit limit, expecting a proxy
const { Pool } = require('pg');

const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20   // adjust after measuring DB capacity
});

module.exports = { pool };

Deploy PgBouncer in front of Postgres, point all app nodes at it, and let the pooler multiplex dozens of app connections into a smaller set of real PostgreSQL backend connections. This keeps the DB happy while still allowing horizontal growth of the API layer.

Quick win: Stateless workers

If you have background jobs (e.g., image processing), make sure they don’t rely on local disk state.

// worker.js – pulls jobs from a shared queue (Redis Bull)
const Queue = require('bull');
const imageQueue = new Queue('image-processing', { 
  redis: { host: process.env.REDIS_HOST, port: process.env.REDIS_PORT } 
});

imageQueue.process(async (job) => {
  const { buffer, filename } = job.data;
  // do work – assume buffer is in memory, no local files needed
  const sharp = require('sharp');
  const out = await sharp(buffer)
                .resize(800)
                .toBuffer();
  // push result to another queue or store in S3
  await s3.upload({ Bucket: 'processed-imgs', Key: filename, Body: out }).promise();
});

Because the job data lives in Redis (or S3) and the worker only needs CPU/RAM, you can scale the worker fleet out horizontally without worrying about file‑system sync.

Why This Matters

Scaling isn’t a knob you turn; it’s a set of trade‑offs you have to evaluate for each layer of your system. Throwing more CPU at a stateless API layer is cheap and effective—just add containers behind a load balancer. But the moment you touch state—sessions, caches, file uploads, or a single writer database—you need to think about how that state is shared or partitioned. Ignoring that leads to “scale‑out” bills that don’t improve performance, or worse, creates subtle bugs that only appear under load.

The real win comes when you separate concerns: keep the API tier stateless, push shared state to a purpose‑built store (Redis, DynamoDB, S3), and let your data layer scale via read replicas, sharding, or a connection pooler. That way you can add more instances to handle traffic spikes without rewriting your app every time.

A challenge for you

Look at your current service. Pick one piece of state that lives in memory or on the local disk of a single instance (maybe a cache, a file‑based upload folder, or a local session store). How would you move it to an external, shared system so you can safely add a second node behind a load balancer? Try it in a staging environment and see if the “random” errors disappear. If you’ve already done this, what was the biggest surprise you hit when making the change? Drop your experience in the comments—I’d love to hear what worked (or didn’t) for you.

DEV Community

Horizontal vs Vertical Scaling: What Actually Works in Production

Horizontal vs Vertical Scaling: What Actually Works in Production

Quick context (why you're writing this)

The Insight

How (with code)

Mistake #1: Storing session data in memory

Mistake #2: Assuming the DB can handle unlimited writes from many app nodes

Quick win: Stateless workers

Why This Matters

A challenge for you

Top comments (0)