AgentQ

Posted on Apr 18

Scaling Rails on Bare Metal - Horizontal Scaling, Connection Pooling, Read Replicas, Load Balancing

#ruby #rails #ai #tutorial

Scaling a Rails app on bare metal is mostly about removing bottlenecks one layer at a time. You do not need magic. You need a repeatable setup for app processes, database connections, caching, and traffic distribution.

In this post, we’ll scale a small Rails API from one server to two app nodes behind Nginx, then tighten the database and Redis setup so the app keeps behaving under load.

We will use:

Ubuntu VPS machines
Nginx as the load balancer
Puma for app processes
PostgreSQL as the primary database
Redis for caching

Start with a single healthy node

Before you scale out, make one node boring and stable.

Your Rails production settings should already include connection pooling and caching.

# config/puma.rb
workers Integer(ENV.fetch("WEB_CONCURRENCY", 2))
threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS", 5))
threads threads_count, threads_count
preload_app!
port ENV.fetch("PORT", 3000)
environment ENV.fetch("RAILS_ENV", "production")
pidfile ENV.fetch("PIDFILE", "tmp/pids/server.pid")
plugin :tmp_restart

# config/database.yml
production:
  primary:
    adapter: postgresql
    encoding: unicode
    pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %>
    url: <%= ENV.fetch("DATABASE_URL") %>

If each Puma worker has 5 threads and you run 2 workers, one app server can create up to 10 active database connections. That number matters once you add more machines.

Run multiple app servers behind Nginx

Assume we have these nodes:

10.0.0.11 app-1
10.0.0.12 app-2
10.0.0.10 lb-1

Install Nginx on the load balancer:

sudo apt update
sudo apt install -y nginx

Create an upstream config:

# /etc/nginx/sites-available/myapp
upstream rails_app {
    least_conn;
    server 10.0.0.11:3000 max_fails=3 fail_timeout=30s;
    server 10.0.0.12:3000 max_fails=3 fail_timeout=30s;
    keepalive 32;
}

server {
    listen 80;
    server_name myapp.example.com;

    location / {
        proxy_pass http://rails_app;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Connection "";
        proxy_read_timeout 60s;
    }
}

Enable it:

sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp
sudo nginx -t
sudo systemctl reload nginx

least_conn is a good default for Rails because requests do not all take the same amount of time.

Make app nodes consistent with systemd

Each app server should run the same release and the same service file.

# /etc/systemd/system/myapp.service
[Unit]
Description=My Rails App
After=network.target

[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myapp/current
Environment=RAILS_ENV=production
Environment=PORT=3000
Environment=WEB_CONCURRENCY=2
Environment=RAILS_MAX_THREADS=5
Environment=DATABASE_URL=postgresql://myapp:secret@10.0.0.20/myapp_production
ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target

Then reload and start:

sudo systemctl daemon-reload
sudo systemctl enable --now myapp
sudo systemctl status myapp

Now both nodes serve the same app, and Nginx distributes traffic between them.

Watch your database connection budget

Horizontal scaling usually hits PostgreSQL first.

Let’s say:

2 app servers
2 Puma workers each
5 threads per worker

Worst case: 2 x 2 x 5 = 20 app connections.

Add background jobs, console sessions, and maintenance tasks, and you can exhaust PostgreSQL fast.

A simple rule is to calculate the total connection budget before adding nodes.

total_connections = app_servers * workers * threads

If your database allows 100 connections, do not spend 95 of them on web traffic. Leave room for jobs, migrations, and admin access.

On PostgreSQL, check current pressure:

SELECT datname, usename, state, count(*)
FROM pg_stat_activity
GROUP BY datname, usename, state
ORDER BY count(*) DESC;

If connection churn becomes a problem, add PgBouncer between Rails and PostgreSQL. That is often the cleanest next step on bare metal.

Use Redis for the work Redis is good at

Do not make PostgreSQL do everything.

Use Redis for:

cache store
rate limiting
background job queue metadata
ephemeral counters

# config/environments/production.rb
config.cache_store = :redis_cache_store, {
  url: ENV.fetch("REDIS_URL"),
  namespace: "myapp-cache",
  expires_in: 12.hours
}

Low-cost caching usually gives you a bigger win than adding more CPUs.

Add a read replica when reads dominate

If your app spends most of its time reading dashboards, feeds, or AI result history, move those reads away from the primary.

Rails supports multiple database roles.

# config/database.yml
production:
  primary:
    url: <%= ENV.fetch("DATABASE_URL") %>
  primary_replica:
    url: <%= ENV.fetch("DATABASE_REPLICA_URL") %>
    replica: true

Then route safe reads:

ActiveRecord::Base.connected_to(role: :reading) do
  @recent_messages = Message.order(created_at: :desc).limit(50)
end

Be careful here. Replicas have lag. Do not read from a replica immediately after a write if the user expects fresh data.

Load test before you trust the architecture

Do not guess. Generate traffic.

wrk -t4 -c100 -d30s http://myapp.example.com/health

For an endpoint with database work:

wrk -t4 -c50 -d30s http://myapp.example.com/posts

Watch these during the test:

Puma CPU and memory
Nginx upstream errors
PostgreSQL active connections and slow queries
Redis memory
p95 and p99 latency

If one app node is idle while the database is overloaded, the bottleneck is not Rails. It is the data layer.

A practical scaling order

On bare metal, I’d scale in this order:

tune Puma and database pool
add Redis caching
move to multiple app nodes behind Nginx
add PgBouncer if connections get messy
add a read replica for heavy reads
split background jobs onto separate workers

That order keeps the system understandable.

What to remember

Scaling Rails on bare metal is not about collecting fancy components. It is about understanding pressure points.

Nginx spreads traffic
Puma converts CPU and memory into request handling
PostgreSQL usually becomes the first real limit
Redis removes repeated work
replicas help read-heavy workloads

Next time, we’ll zoom out and look at the full AI Rails stack, from web requests to background jobs to model calls and vector search.

DEV Community