DEV Community

Ian
Ian

Posted on • Updated on

How I Learned to Think at Scale

Introduction

Hey there, my name is Ian. I'm a software engineer for a company that has hundreds of thousands of visitors every month. This may seem small to some, but for me, it's the first time writing code and creating deployments for a website of this scale. With this, there are a lot of lessons and growing pains I've come across. I wanted to share some of those lessons and how I learned them.

The Setting

Recently, I've moved the website's account server (it controls user logins and session tokens) to a Kubernetes cluster so we can have limited downtime and Load Balancing built-in. Before the move, session tokens were handled in-memory. Usually, this wouldn't be a huge problem, but with Kubernetes you need to be careful about stateful applications. In this case, the solution was moving our session tokens to the Mongodb instance outside of the cluster.

Now that we have our tokens being managed correctly, things should be all good to go... right?

Unfortunately, no.

The Problem

I wake up on a Saturday morning to a pleasant Slack message, "The site is down, no one can login."

Fun!

I open my laptop to confirm the status of the website is indeed down, specifically the Account Server that I just got done moving to Kubernetes. My stomach dropped. I had worked hard learning Kubernetes, Docker, and Nginx to migrate the server. It felt like all that work was for nothing.

The Process Of Debugging

I immediately checked the status of the Kubernetes Pods, all were running. Next, I opened the logs for each pod and used kubectl describe pod <pod_name> to gather more information. All Pods were alive and well, so why couldn't users login?

It was time to get my hands dirty and load up the Account Server locally to do some testing. All the requests worked instantly. Mongodb's read and writes were 1ms long, our user index was being used, and connecting to the production database worked too.

This meant our problem was at scale.

After a couple of hours of reviewing and rewriting code, I picked up on our first clue! Only the endpoints that utilized mongodb's MongoClient were experiencing the 504 error code.
I decided to try some queries in the mongo cli to see if we were experiencing issues with reading and writing. First, I tried a findOne on the users collection, that worked fine. Next, I tried writing a user with insertOne, that also worked fine.

Hmmm. What could the issue be then? Without anymore clues to go off of, I updated the mongodb npm package to the latest version in hopes I'd ran into a bug that has been fixed in the latest version. Unfortunately, we were still in no man's land with no success.

Eureka!

Out of curiosity, I decided to do a findOne query on our tokens collection. It took 10 seconds. This might not seem that long, but compared to our users millisecond response time this was a huge difference.

I used mongodb's .explain() function on a tokens collection query, and I realized it was querying all token documents. This explains exactly why our requests were timing out. Every single time a user started a session, mongodb would query all of the tokens in our database.

This was a huge issue.

I simply used db.tokens.createIndex() on the token's id and BOOM, the problem was solved.

Lessons I Learned

  1. Know your databases! Learn the tools on how to scale your database. Indexes, pool sizes, replSets, e.t.c are all essential to scaling a mongodb database.
  2. Think about each database query before pushing your code to production. How often does this query run? How expensive is this query?
  3. Even if your server goes down after migrating it to Kubernetes, doesn't mean your work was wasted. It's important to use it as a learning experience.

Conclusion

If you made it this far, hopefully, you enjoyed your read and learned something too! If you'd like to follow me on other platforms, I stream on twitch, and you can also find me on twitter.
Thanks for reading!

Discussion (3)

Collapse
hamodey85 profile image
Mohammed Almajid

What you using for connecting to mongodb?
I mean do you use or(mongoose) or you using something else?
Also it's weird that it query all tokens
But you calling find one.
Why is that

Collapse
helloitsian profile image
Ian Author

For this project we use mongodb's MongoClient that comes straight out of the box.

Now my knowledge is a bit fuzzy about the second question, but I believe it's because if you don't have an index setup for your queries mongodb tests your query on every document in the collection. So I'm not getting all the documents, but mongodb is instead testing my query against all the documents. bad wording from me

Collapse
hamodey85 profile image
Mohammed Almajid

i think indexing important for performance