DEV Community

AgentQ
AgentQ

Posted on

Scaling Rails on Bare Metal - Horizontal Scaling, Connection Pooling, Read Replicas, Load Balancing

Scaling a Rails app on bare metal is mostly about removing bottlenecks one layer at a time. You do not need magic. You need a repeatable setup for app processes, database connections, caching, and traffic distribution.

In this post, we’ll scale a small Rails API from one server to two app nodes behind Nginx, then tighten the database and Redis setup so the app keeps behaving under load.

We will use:

  • Ubuntu VPS machines
  • Nginx as the load balancer
  • Puma for app processes
  • PostgreSQL as the primary database
  • Redis for caching

Start with a single healthy node

Before you scale out, make one node boring and stable.

Your Rails production settings should already include connection pooling and caching.

# config/puma.rb
workers Integer(ENV.fetch("WEB_CONCURRENCY", 2))
threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS", 5))
threads threads_count, threads_count
preload_app!
port ENV.fetch("PORT", 3000)
environment ENV.fetch("RAILS_ENV", "production")
pidfile ENV.fetch("PIDFILE", "tmp/pids/server.pid")
plugin :tmp_restart
Enter fullscreen mode Exit fullscreen mode
# config/database.yml
production:
  primary:
    adapter: postgresql
    encoding: unicode
    pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %>
    url: <%= ENV.fetch("DATABASE_URL") %>
Enter fullscreen mode Exit fullscreen mode

If each Puma worker has 5 threads and you run 2 workers, one app server can create up to 10 active database connections. That number matters once you add more machines.

Run multiple app servers behind Nginx

Assume we have these nodes:

  • 10.0.0.11 app-1
  • 10.0.0.12 app-2
  • 10.0.0.10 lb-1

Install Nginx on the load balancer:

sudo apt update
sudo apt install -y nginx
Enter fullscreen mode Exit fullscreen mode

Create an upstream config:

# /etc/nginx/sites-available/myapp
upstream rails_app {
    least_conn;
    server 10.0.0.11:3000 max_fails=3 fail_timeout=30s;
    server 10.0.0.12:3000 max_fails=3 fail_timeout=30s;
    keepalive 32;
}

server {
    listen 80;
    server_name myapp.example.com;

    location / {
        proxy_pass http://rails_app;
        proxy_http_version 1.1;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header Connection "";
        proxy_read_timeout 60s;
    }
}
Enter fullscreen mode Exit fullscreen mode

Enable it:

sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp
sudo nginx -t
sudo systemctl reload nginx
Enter fullscreen mode Exit fullscreen mode

least_conn is a good default for Rails because requests do not all take the same amount of time.

Make app nodes consistent with systemd

Each app server should run the same release and the same service file.

# /etc/systemd/system/myapp.service
[Unit]
Description=My Rails App
After=network.target

[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myapp/current
Environment=RAILS_ENV=production
Environment=PORT=3000
Environment=WEB_CONCURRENCY=2
Environment=RAILS_MAX_THREADS=5
Environment=DATABASE_URL=postgresql://myapp:secret@10.0.0.20/myapp_production
ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb
Restart=always
RestartSec=5

[Install]
WantedBy=multi-user.target
Enter fullscreen mode Exit fullscreen mode

Then reload and start:

sudo systemctl daemon-reload
sudo systemctl enable --now myapp
sudo systemctl status myapp
Enter fullscreen mode Exit fullscreen mode

Now both nodes serve the same app, and Nginx distributes traffic between them.

Watch your database connection budget

Horizontal scaling usually hits PostgreSQL first.

Let’s say:

  • 2 app servers
  • 2 Puma workers each
  • 5 threads per worker

Worst case: 2 x 2 x 5 = 20 app connections.

Add background jobs, console sessions, and maintenance tasks, and you can exhaust PostgreSQL fast.

A simple rule is to calculate the total connection budget before adding nodes.

total_connections = app_servers * workers * threads
Enter fullscreen mode Exit fullscreen mode

If your database allows 100 connections, do not spend 95 of them on web traffic. Leave room for jobs, migrations, and admin access.

On PostgreSQL, check current pressure:

SELECT datname, usename, state, count(*)
FROM pg_stat_activity
GROUP BY datname, usename, state
ORDER BY count(*) DESC;
Enter fullscreen mode Exit fullscreen mode

If connection churn becomes a problem, add PgBouncer between Rails and PostgreSQL. That is often the cleanest next step on bare metal.

Use Redis for the work Redis is good at

Do not make PostgreSQL do everything.

Use Redis for:

  • cache store
  • rate limiting
  • background job queue metadata
  • ephemeral counters
# config/environments/production.rb
config.cache_store = :redis_cache_store, {
  url: ENV.fetch("REDIS_URL"),
  namespace: "myapp-cache",
  expires_in: 12.hours
}
Enter fullscreen mode Exit fullscreen mode

Low-cost caching usually gives you a bigger win than adding more CPUs.

Add a read replica when reads dominate

If your app spends most of its time reading dashboards, feeds, or AI result history, move those reads away from the primary.

Rails supports multiple database roles.

# config/database.yml
production:
  primary:
    url: <%= ENV.fetch("DATABASE_URL") %>
  primary_replica:
    url: <%= ENV.fetch("DATABASE_REPLICA_URL") %>
    replica: true
Enter fullscreen mode Exit fullscreen mode

Then route safe reads:

ActiveRecord::Base.connected_to(role: :reading) do
  @recent_messages = Message.order(created_at: :desc).limit(50)
end
Enter fullscreen mode Exit fullscreen mode

Be careful here. Replicas have lag. Do not read from a replica immediately after a write if the user expects fresh data.

Load test before you trust the architecture

Do not guess. Generate traffic.

wrk -t4 -c100 -d30s http://myapp.example.com/health
Enter fullscreen mode Exit fullscreen mode

For an endpoint with database work:

wrk -t4 -c50 -d30s http://myapp.example.com/posts
Enter fullscreen mode Exit fullscreen mode

Watch these during the test:

  • Puma CPU and memory
  • Nginx upstream errors
  • PostgreSQL active connections and slow queries
  • Redis memory
  • p95 and p99 latency

If one app node is idle while the database is overloaded, the bottleneck is not Rails. It is the data layer.

A practical scaling order

On bare metal, I’d scale in this order:

  1. tune Puma and database pool
  2. add Redis caching
  3. move to multiple app nodes behind Nginx
  4. add PgBouncer if connections get messy
  5. add a read replica for heavy reads
  6. split background jobs onto separate workers

That order keeps the system understandable.

What to remember

Scaling Rails on bare metal is not about collecting fancy components. It is about understanding pressure points.

  • Nginx spreads traffic
  • Puma converts CPU and memory into request handling
  • PostgreSQL usually becomes the first real limit
  • Redis removes repeated work
  • replicas help read-heavy workloads

Next time, we’ll zoom out and look at the full AI Rails stack, from web requests to background jobs to model calls and vector search.

Top comments (0)