This is a brief retro on some downtime we experienced at dev.to in the past day.
I posted this in the process of investigating the issue:
...
For further actions, you may consider blocking this person and/or reporting abuse
If you haven't already it would probably make sense to add connection pooling. I believe Heroku allows you to funnel all of your connections through pgbouncer. Postgres doesn't like having a lot of connections open so it really only scales vertically to a specific limit (connection wise).
In any case it is better to have less connections open since there is a fairly high overheard per postgres connection.
Additionally, if you have workers connecting as well as web servers you can pool them separately in pgbouncer to make sure one doesn't starve the other.
Good luck!
Yeah I think you're right. I feel like I wrongly assumed we weren't as close to this bottleneck as we were. These seem like the obvious next steps.
Were the connections that were pushing y'all over the limit strictly necessary? Meaning, is there an underlying issue with connections staying open for longer than they "need" to? I think I was seeing something similar recently and would love some insight.
It’s possible. I’d be happy to chat it out, either via #help or slip into my /connect to talk more.
Either way I might have some relevant thoughts for you.
DMing :)
I love these real-world scenario articles - thanks for sharing!
What do you use for monitoring? We use NewRelic, but the price doesn't scale well, so are looking at setting up Nagios.
We use Airbrake and Skylight and Timber as our main tools in this arena. In addition to pricing scale like you've mentioned, I ran into some issues with the script New Relic was injecting early on in the project when I was trying to optimize performance. There may have been other ways to include the service but I wound up just going in a different direction at the time and it's worked out well.
Glad you caught it!
A simple idea to improve the monitoring could be to create an alarm on logged database metrics.
With Standard tiers Heroku periodically logs a line with the statuts of the DB, in this line you have a key named:
sample#active-connections=N
where N is the number of opened connections.I think that with Papertrail (probably also other logging addons) you can set an alert on that string that sends the event every 10 minutes or 1 hour to your preferred destination (like a low key "metrics" channel on Slack).
I think I'm going to try it myself tomorrow :D
Yeah, we use Timber and do this sort of stuff but at the moment we've mostly just configured alerts around relatively known issues. We have the tools but we need to step up and really configure these things to be more useful.
Don't be too sad about it! It literally means that community grows. 😉
I'm recently diving into scaling apps... and yeah, most probably what you're looking for is a messaging system like RabbitMQ and SOA.
youtu.be/RhfyP8pEEc4