Scaling a Rails app on bare metal is mostly about removing bottlenecks one layer at a time. You do not need magic. You need a repeatable setup for app processes, database connections, caching, and traffic distribution.
In this post, we’ll scale a small Rails API from one server to two app nodes behind Nginx, then tighten the database and Redis setup so the app keeps behaving under load.
We will use:
- Ubuntu VPS machines
- Nginx as the load balancer
- Puma for app processes
- PostgreSQL as the primary database
- Redis for caching
Start with a single healthy node
Before you scale out, make one node boring and stable.
Your Rails production settings should already include connection pooling and caching.
# config/puma.rb
workers Integer(ENV.fetch("WEB_CONCURRENCY", 2))
threads_count = Integer(ENV.fetch("RAILS_MAX_THREADS", 5))
threads threads_count, threads_count
preload_app!
port ENV.fetch("PORT", 3000)
environment ENV.fetch("RAILS_ENV", "production")
pidfile ENV.fetch("PIDFILE", "tmp/pids/server.pid")
plugin :tmp_restart
# config/database.yml
production:
primary:
adapter: postgresql
encoding: unicode
pool: <%= ENV.fetch("RAILS_MAX_THREADS", 5) %>
url: <%= ENV.fetch("DATABASE_URL") %>
If each Puma worker has 5 threads and you run 2 workers, one app server can create up to 10 active database connections. That number matters once you add more machines.
Run multiple app servers behind Nginx
Assume we have these nodes:
-
10.0.0.11app-1 -
10.0.0.12app-2 -
10.0.0.10lb-1
Install Nginx on the load balancer:
sudo apt update
sudo apt install -y nginx
Create an upstream config:
# /etc/nginx/sites-available/myapp
upstream rails_app {
least_conn;
server 10.0.0.11:3000 max_fails=3 fail_timeout=30s;
server 10.0.0.12:3000 max_fails=3 fail_timeout=30s;
keepalive 32;
}
server {
listen 80;
server_name myapp.example.com;
location / {
proxy_pass http://rails_app;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Connection "";
proxy_read_timeout 60s;
}
}
Enable it:
sudo ln -s /etc/nginx/sites-available/myapp /etc/nginx/sites-enabled/myapp
sudo nginx -t
sudo systemctl reload nginx
least_conn is a good default for Rails because requests do not all take the same amount of time.
Make app nodes consistent with systemd
Each app server should run the same release and the same service file.
# /etc/systemd/system/myapp.service
[Unit]
Description=My Rails App
After=network.target
[Service]
Type=simple
User=deploy
WorkingDirectory=/var/www/myapp/current
Environment=RAILS_ENV=production
Environment=PORT=3000
Environment=WEB_CONCURRENCY=2
Environment=RAILS_MAX_THREADS=5
Environment=DATABASE_URL=postgresql://myapp:secret@10.0.0.20/myapp_production
ExecStart=/usr/local/bin/bundle exec puma -C config/puma.rb
Restart=always
RestartSec=5
[Install]
WantedBy=multi-user.target
Then reload and start:
sudo systemctl daemon-reload
sudo systemctl enable --now myapp
sudo systemctl status myapp
Now both nodes serve the same app, and Nginx distributes traffic between them.
Watch your database connection budget
Horizontal scaling usually hits PostgreSQL first.
Let’s say:
- 2 app servers
- 2 Puma workers each
- 5 threads per worker
Worst case: 2 x 2 x 5 = 20 app connections.
Add background jobs, console sessions, and maintenance tasks, and you can exhaust PostgreSQL fast.
A simple rule is to calculate the total connection budget before adding nodes.
total_connections = app_servers * workers * threads
If your database allows 100 connections, do not spend 95 of them on web traffic. Leave room for jobs, migrations, and admin access.
On PostgreSQL, check current pressure:
SELECT datname, usename, state, count(*)
FROM pg_stat_activity
GROUP BY datname, usename, state
ORDER BY count(*) DESC;
If connection churn becomes a problem, add PgBouncer between Rails and PostgreSQL. That is often the cleanest next step on bare metal.
Use Redis for the work Redis is good at
Do not make PostgreSQL do everything.
Use Redis for:
- cache store
- rate limiting
- background job queue metadata
- ephemeral counters
# config/environments/production.rb
config.cache_store = :redis_cache_store, {
url: ENV.fetch("REDIS_URL"),
namespace: "myapp-cache",
expires_in: 12.hours
}
Low-cost caching usually gives you a bigger win than adding more CPUs.
Add a read replica when reads dominate
If your app spends most of its time reading dashboards, feeds, or AI result history, move those reads away from the primary.
Rails supports multiple database roles.
# config/database.yml
production:
primary:
url: <%= ENV.fetch("DATABASE_URL") %>
primary_replica:
url: <%= ENV.fetch("DATABASE_REPLICA_URL") %>
replica: true
Then route safe reads:
ActiveRecord::Base.connected_to(role: :reading) do
@recent_messages = Message.order(created_at: :desc).limit(50)
end
Be careful here. Replicas have lag. Do not read from a replica immediately after a write if the user expects fresh data.
Load test before you trust the architecture
Do not guess. Generate traffic.
wrk -t4 -c100 -d30s http://myapp.example.com/health
For an endpoint with database work:
wrk -t4 -c50 -d30s http://myapp.example.com/posts
Watch these during the test:
- Puma CPU and memory
- Nginx upstream errors
- PostgreSQL active connections and slow queries
- Redis memory
- p95 and p99 latency
If one app node is idle while the database is overloaded, the bottleneck is not Rails. It is the data layer.
A practical scaling order
On bare metal, I’d scale in this order:
- tune Puma and database pool
- add Redis caching
- move to multiple app nodes behind Nginx
- add PgBouncer if connections get messy
- add a read replica for heavy reads
- split background jobs onto separate workers
That order keeps the system understandable.
What to remember
Scaling Rails on bare metal is not about collecting fancy components. It is about understanding pressure points.
- Nginx spreads traffic
- Puma converts CPU and memory into request handling
- PostgreSQL usually becomes the first real limit
- Redis removes repeated work
- replicas help read-heavy workloads
Next time, we’ll zoom out and look at the full AI Rails stack, from web requests to background jobs to model calls and vector search.
Top comments (0)