Hello! Need some help from the bright minds of dev.to 🙏 Recently I started to implement a backend for my project using flask + sqlalchemy (postgres + pgbouncer) + gunicorn. The problem is, that it performs only ~1 request per second... I started to play with the gunicorn config and set 5 gevent workers, 🙈 monkey patched psycorp2 (the driver for postgres). Didn't work. Okay, maybe sqlalchemy blocks the main thread and the app performs poorly. So I switched to 5 sync workers. Now at least it should give me 5 RPS... Not this time. After some investigation, I found out that the bottleneck was between my web app and the db. pg bouncer (47 connections) and postgres are hosted on digital ocean, so I believe the guys made the right settings for me. The only thing that is left is sqlalchemy and how it handles sessions (and my hands which wrote the code). Maybe somebody had a similar issue or can give advice where to look further? Thank you in advance, you're awesome 💚
For further actions, you may consider blocking this person and/or reporting abuse
Top comments (10)
What happens if you remove pgbouncer from the equation? 1 request per second seems a configuration problem.
You're saying your DB and pg bouncer are on Digital Ocean, but where the rest of the app is?
Hi, the rest of the app is also in digital ocean (droplet). I was yesterday playing with the gunicorn config and made a blazing 5 request per second... (with a fail rate ~20%) BTW, the testing software is locust. I could not believe that pgbouncer cannot handle more than that. Without pg bouncer the result is the same, that's super weird. I am not using raw sqlalchemy, just the flask-sqlalchemy, which closes connections and handles all the stuff under the hood.
It's pretty weird. What happens on localhost? Have you tried running both app and the DB locally and load test it?
You need a process of elimination :D
Yep, tried running on localhost.
Interesting observation - gunicorn 9 sync workers - the query time to db is around ~500ms, RPS is 5.
gunicorn + gevent - query time gets from 1 sec to 100 sec and it has the same RPS result as sync. I think psycorp2 monkey patching is doing a poor job and with gevent it blocks the main thread. sigh.
I honestly don't understand how the app is setup, can you describe it better?
You have a flask web app which runs on production with gunicorn but why did you choose to use gevent as well? Have you tried without it?
With psycorp2 you mean psycopg2? Why is it monkey patched? Because of gevent? Which version of PostgreSQL are you using?
What happens if you just start the app without gunicorn and gevent locally and you benchmark it? 5 requests per seconds is pretty low, what is your app doing?
Sorry but it's hard to help without information
Thank you rhymes for feedback! Basically, I found out that the bottleneck was the db, the query performance is 5 transactions per second. After query optimization to 10 rps I decided to cache some info in redis and update the cache in background jobs. But another problem came out (I think it's good karma 😇). Redis benchmark shows 50k rps, but in my endpoint it makes only 50rps 😫 Soo, I'm currently profiling the sh*t of the code.
The app setup is next ->
Sorry if I re-iterate: have you tried without pgbouncer? Because 10 rps is ridicously low. How many RPS do you measure without it?
You shouldn't need redis either, at least not as a patch for the problem.
The easiest way you can get out of this is to isolate each component.
What happens if you remove pgbouncer AND gevent?
I measure on localhost without any pgbouncer, using pgbench, so no gevent and pgbouncer.
Let's recap because you're giving out information in small pieces and I'm having difficulties following what's happening:
pgbench
from your localhost you're getting 5 RPSThere a few aspects to consider:
pgbench
tests PostgreSQL, not your app. It also tells you transactions per second, not requests per second.pgbench
correctly? Benchmarking can introduce methodology errors. How many connections are you opening? Can you provide the command line you run the test with?Finally, have you tried to actually load test the production app? Or at least a copy of it. With tools like wrk for example.
Hey,
Will it be possible for you to share your code base? Delete any business critical logic that you might have written.