Over the last year I have been getting random ActiveRecord::ConnectionTimeoutError errors, where the application could not obtain a connection from the pool within 5 seconds. But I'm stumped as to what the actual issue is...
- we're using an AWS RDS instance that can handle hundreds of connections, which we never even come close to by the RDS connection log (max use is ~30).
- I have been over forums that talk about using Thread.new... but we don't spin any of our own threads.
- I've seen forums that talk about workers using more pool connections than necessary, but I haven't set up any puma workers, we just use web instances!
- There is no consistency in which controller#action throws the error, and no consistency on time of day or week.
- Sometimes the error will not show up for weeks, and then I'll get 15 instances of it in a few days
I'm at a loss as to where to even start looking...
Top comments (8)
Hey Ethan,
If you setup Rails using the puma option, Rails uses a default thread pool of 5 for puma (config/puma.rb, see max_threads_count), and a connection pool of 5 (config/database.yml, see default->pool). That means effectively that 5 puma threads share 5 database connections.
In some instances, Puma can process more than 5 requests at once (if something is blocked by IO, it can pause that and execute another). So during very busy times (when more than 5 requests are processing at once), if the 5 db connections are all busy with something else (usually because of slow SQL queries), you'll get that error when the puma thread has to wait over 5 seconds for a free DB connection.
The easiest fix is to increase the DB connection pool (usually I'd do three times the amount of puma threads). Alternatively, decrease the max threads in puma from 5 to 3 (this means however that you'll not be able to serve as many requests per second). Since you're on AWS with connection max of at least 100, increasing the DB connection pool to 15 is the better option here.
The other thing to do is optimise your queries. If the app can't get a DB connection for 5 seconds, your database queries are taking way too long and not returning connections to the pool fast enough. You should aim to have your queries returning within 50ms or less usually.
Hope this information helps.
"In some instances, Puma can process more than 5 requests at once" Whaaaat? So having a max_threads_count of 5 isn't actually a MAX of 5 threads? That's good to know! I've been assuming that I could count on that number.
It is a max of 5 threads, but not a guaranteed max of 1 request per thread, especially when using different ruby interpreters without a GIL. At least that is what I've seen in my own use of Puma, and I'm sure someone more familiar with Puma could provide a more detailed explanation. The short answer anyway is: keep your DB pool size higher than the puma thread count to avoid potential issues
Also one more thing to keep in mind when using Puma, because it uses threads, you have to be very careful to make your code threadsafe. Code something the wrong way, and you could end up leaking data from one thread to another; i.e. someone could end up seeing someone elses data.
If you're an application that requires absolute assurance of no data leakage (e.g. bank, insurance, health, etc etc), avoid having to worry about threadsafety by switching from Puma threads (1 process, 5 threads each) to Puma workers (5 processes, 1 thread each).
Been a while since I messed with Rails, but there're usually a few culprits here.
1) What did you configure your connection pool to be in Rails? Your Rails application uses a connection pool to connect to whatever, so if you set Rails to only have 2 connections in its pool that's all it can have and will fail if 2 aren't enough based on the demand. This has nothing to do with your underlying database.
2) What is happening with the connections that are in use? While many people will naively raise their pool numbers, this hits a limit where it still doesn't quite stabilize things and becomes clear it isn't the pool. That is to say if you have queries or odd things happening where your Rails application cannot "release" something back to the pool then you'll always run out. This one can be a little trickier to debug, but is almost always one of the culprits.
I know this isn't as nice an answer as, "Turn this option on," but in my experience with other connection pooling patterns these are the two things that are always at fault.
1) I've been setting my connection pool to my thread count... but based on an answer above, it seems like that may not be enough sometimes. I'm going to try increasing it, we'll see how that goes...
2) I will definitely be digging through my queries to see if I can optimize things.
Thanks!
Could you post somewhere your puma settings? A good practice is to basically multiply thread numbers by process numbers and then leave a margin of free connections to avoid getting to the limit.
My advice is to read this article and implement the suggestions, it has never failed me in the past :D
Configuring Puma, Unicorn and Passenger for Maximum Efficiency
I'm familiar with this, but I forget exactly the path going down to deal with it, so I'll throw some stuff out.
This post seems like an interesting analysis: Is there anything in this post you haven't considered yet?
Active Record, Sidekiq, pools and threads
appaloosa.io γ» Feb 7 '19 γ» 5 min read
I suspect that this could have to do with how the threads, pool size, and process multiply against each other if you're using Puma, or generally be in this veign... Maybe look into the knobs you can pull with Puma? (Or whichever server you're running?)
speedshop.co/2017/10/12/appserver....
Hey Ben, thanks for the reply! I poked at the thread issue, but we're not using sidekiq or any asynchronous workers, and we're not using any custom threading. Everything "should be" synchronous to our web instances?
We are using puma, and I'll take a further look into the puma knobs I can play with!