How We Spotted and Fixed a Performance Degradation in Our Python Code

Omer Lachish — Tue, 05 Nov 2019 07:27:14 +0000

Recently we’ve started transitioning from using Celery to using RQ as our task running engine. For phase one, we only migrated the jobs that aren’t directly running queries. These jobs include things like sending emails, figuring out which queries need to be refreshed, recording user events and other maintenance jobs.

After deploying this we noticed our RQ workers required significantly more CPU to perform the same amount of tasks as our Celery workers did. I thought I’d share how I profiled this and remedied the problem.

A word about the differences between Celery and RQ

Both Celery and RQ have the concept of a Worker process and both use forking to allow parallelized execution of jobs. When you launch a Celery worker it forks to several different processes, each one autonomously handles tasks. With RQ, a worker will only instantiate a single sub-process (known as a “Work Horse”) which will perform a single job and then die. When the worker fetches another job from the queue, it will fork a new Work Horse.‌

‌In RQ you can achieve the same parallelism as Celery simply by running more worker processes. However there is a subtle difference between Celery and RQ: Celery workers instantiate multiple subprocesses at launch time and reuse them for multiple tasks. With RQ, you have to fork on every job. There are pros and cons to both approaches, but these are out of scope for this post.

Benchmarking

‌Before I profiled anything, I wanted a benchmark of how long it takes for a worker container to process 1000 jobs. I decided to focus on the record_event job since it is a frequent, lightweight operation. I used the time command to measure performance, which required a couple changes to the source code:‌

To measure these 1000 jobs I preferred RQ’s burst mode, which exits the process after handling the jobs.
I wanted to avoid measuring other jobs that might be scheduled at the time of benchmarking. So I moved record_event to a dedicated queue called benchmark by replacing @job(‘default’) with @job(‘benchmark’) right above record_event’s declaration in tasks/general.py.

Now we can start timing things. First of all, I wanted to see how long it takes for a worker to start and stop (without any jobs) so I can subtract that time from any result later.

$ docker-compose exec worker bash -c "time ./manage.py rq workers 4 benchmark"

real    0m14.728s
user    0m6.810s
sys 0m2.750s

‌Worker initialization takes 14.7 seconds on my machine. I’ll remember that.‌ Then, I shoved 1000 dummy record_event jobs onto the benchmark queue:‌

$ docker-compose run --rm server manage shell <<< "from redash.tasks.general import record_event; [record_event.delay({ 'action': 'create', 'timestamp': 0, 'org_id': 1, 'user_id': 1, 'object_id': 0, 'object_type': 'dummy' }) for i in range(1000)]"

‌Now let’s run the same worker command as before and see how long it takes to process 1,000 jobs:‌

$ docker-compose exec worker bash -c "time ./manage.py rq workers 4 benchmark"

real    1m57.332s
user    1m11.320s
sys 0m27.540s

Deducting our 14.7 boot time, we see that it took 102 seconds for 4 workers to handle 1,000 jobs. Now let’s try to figure out why! For that, we’ll use py-spy while our workers are working hard.

Profiling

Let’s add another 1,000 jobs (because our last measurement consumed all of them), run the workers and simultaneously spy on them:

$ docker-compose run --rm server manage shell <<< "from redash.tasks.general import record_event; [record_event.delay({ 'action': 'create', 'timestamp': 0, 'org_id': 1, 'user_id': 1, 'object_id': 0, 'object_type': 'dummy' }) for i in range(1000)]"
$ docker-compose exec worker bash -c 'nohup ./manage.py rq workers 4 benchmark & sleep 15 && pip install py-spy && rq info -u "redis://redis:6379/0" | grep busy | awk "{print $3}" | grep -o -P "\s\d+" | head -n 1 | xargs py-spy record -d 10 --subprocesses -o profile.svg -p'
$ open -a "Google Chrome" profile.svg

I know, that last command is quite a handful. Ideally I would break that command on every ‘&&’ for readability, but the commands should run sequentially inside the same docker-compose exec worker bash session, so here’s a quick breakdown of what it does:

Start 4 burst workers in the background
Wait 15 seconds (roughly to get them to finish booting)
Install py-spy
Run rq-info and slice up the pid for one of the workers
Record 10 seconds of activity in that pid and save it to profile.svg

The result was this flame graph:

Digging inside the flame graph I noticed that record_event spends a big portion of its execution time in sqlalchemy.orm.configure_mappers and it happens on every job execution. From their docs I saw this:

Initialize the inter-mapper relationships of all mappers that have been constructed thus far.

This sort of thing really doesn’t need to happen on every fork. We could initialize these relationships in the parent worker once and avoid the repeated effort in the work horses.

‌So I’ve added a call to sqlalchemy.org.configure_mappers() before starting the work horse and measured again:

$ docker-compose run --rm server manage shell <<< "from redash.tasks.general import record_event; [record_event.delay({ 'action': 'create', 'timestamp': 0, 'org_id': 1, 'user_id': 1, 'object_id': 0, 'object_type': 'dummy' }) for i in range(1000)]
$ docker-compose exec worker bash -c "time ./manage.py rq workers 4 benchmark"

real    0m39.348s
user    0m15.190s
sys 0m10.330s

If we deduct our 14.7 second boot time, we are down from 102 seconds to 24.6 seconds for 4 workers to handle 1000 jobs. That’s a 4x improvement! With this fix we managed to slice our RQ production resources by 4 and keep the same throughput.‌

My key takeaway here is that you should keep in mind that your app behaves differently when it’s a single process and when it forks. If there’s some heavy lifting to perform on every job, it’s usually a good idea to do it once before the fork. These kinds of things don’t show up during testing and development, so make sure you measure well and dig into any performance issues that come up.

What’s ahead for Redash in 2019 🤩

Arik Fraimovich — Wed, 16 Jan 2019 10:51:34 +0000

When I wrote the blog post announcing the founding of the Redash company, I was naive enough to believe that I would write regular updates about the process of building Redash as a product and a company. Silly me. Turns out it's a full-time job and then some to build a product, bootstrap a company and grow a team. Something had to give; and one such thing was regular updates about the process. So much has happened by now that one article could hardly cover it all. But with 2018 recently concluded, here is my attempt to summarize the last twelve months and take a peek at what's to come.

On its surface, Redash's trajectory in 2018 went like previous years: we added users and customers, grew our revenues, and made the product better than ever. But there was one big difference as we entered 2018: Redash became a team effort.

Until late-2017, Redash was a one-man operation. It became a group of three at the start of 2018. And now for 2019 there are six of us. What enabled this team growth was our customers. As before, we haven’t taken any external funding. To all our past and present customers: thank you 🙏

While we started as a team of 3 in 2018, we are starting 2019 as a team of 6. What enabled this team growth was our customers.

As Redash matured as a company, the product also matured dramatically in 2018. We had three major releases (v4, v5 and v6) that included:

New UI and better UX
Dynamic dashboard layouts
Tagging and Favorites
Parameters UI improvements
More visualizations
More data sources

A glimpse of the new UI introduced in V4.

Plans for 2019

Team Effort

As the team grows, we're building procedures that keep Redash a fun and productive place to work and (importantly) make me redundant to the day to day operations of the company. This will improve team velocity, promote greater stability for our customers, and free my time to focus on more strategic projects.

Even more interactive dashboards and queries

In the releases we made in 2018, we improved the UI for parameters significantly. In 2019 we’re going to finish this effort by adding a few more needed capabilities to parameters (optional parameters, multiple select, better support in dashboards) and by making parameters available everywhere (read only users, shared dashboards, embeds).

Better Permissions Model

Once we have the ability to safely run parameterized queries, we can finally upgrade the Redash permission model and make it more user friendly. The goal is to augment the current model (if you have access to the data source, you have access to the query/dashboard) with a Google Drive like model, where you can assign permissions to individual users or groups on the content level.

Coupled with the improved parameters support mentioned above, Redash in 2019 will reach further and empower more people inside your organizations.

Technical Progress

This year, we plan to conclude our migration to React that began in 2018. This should make it easier to add features and interface with our open-source community. We will also continue to improve the testing story around Redash (thanks Cypress and Percy!). All this will allow us to release more often with greater confidence.

Community

We recently invited nine members from the user community to become Redash maintainers. In 2019, we want to improve our coordination with maintainers and clean up our backlog of Pull Requests.

More Guides and Resources

One thing we neglected in 2018 was to produce up-to-date documentation and usage examples. We want to change this in 2019, but we would also appreciate your feedback! If you have any interesting uses of Redash, we urge you to share them. The forum is a great venue for this.

These are our plans for now. It’s known that no plan survives first contact, so we hope to still surprise you (and ourselves 😅) with some of what we will achieve in 2019. 🍾

DEV Community: Redash