DEV Community

Cover image for The AWS Trap for Startups
Mukit, Ataul
Mukit, Ataul

Posted on

The AWS Trap for Startups

We have been running quite a successful venture https://cookups.com.bd for some time with users of the system increasing day by day.

Our startup Cookups is kind of Uber for home made food. It is one of the biggest portal for ordering home made food in our country.

Few days ago, there was a huge surge of cpu and ram usage in our system (run on top of aws of-course). Our initial assumption was, since our user base is growing exponentially day by day, this is quite a nice headache to have. The orders from users are piling up and all we need to do is choose a higher aws tier (say m series instead of t series) with higher cpu power and ram, and the problem will be resolved. Since it's a live system we didn't want to take any chances and was about to go for the higher configuration before normal senses prevailed.

Thankfully, we decided to run a test in the staging server importing the production database, with higher cpu and ram and see whether the problem gets resolved. The next day it showed cpu and ram usage surged again at a specific time and although we have more ram and cpu power, the problem really didn't subside.

Yes, more users are registering in our system day by day, but the cpu usage and ram usage shouldn't peak at a specific time. Then we found, there is a schedular that runs some scheduled task everyday and because there was some corrupted data in our database (due to some previous coding error which we fixed later), the database was getting locked and cpu and ram usage increased as the schedular was failing and retrying the scheduled tasks over and over again. So, we were about to throw hardware to a software problem.

Since it's easy to upgrade RAM and CPU in AWS (at-least that's what gets promoted intentionally or un-intentionally all over internet), the moment we see CPU or RAM usage increasing, there is an inclination towards vertical scale up although the problem may be elsewhere. That's why I called it the AWS trap.

So here are some tips on how to avoid going for scaling immediately without looking deep into the matter:

1) Check the nature of cpu and ram usage.

If the cpu or ram increases drastically at a given period of a day, high chance there is a problem in the code or db.

2) See if the RAM or CPU usage graph looks like the following:

Alt cpu/ram usage

Most likely, for a start up company, this type of graph for cpu or ram is not because of sudden user increase. This might be due to some coding error.

3) Before really going for scaling (vertical or horizontal), think about if there are ways to identify the bottlenecks (most likely, there are quite a few of them in initial stages whether you admit or not) and solve them first.

4) Review existing codes, improve sql/orm queries if possible and make sure you really understand whats going on.

5) Consider scaling and premature optimisation as the last option for any new system.

signing off,
@mukit

Top comments (10)

Collapse
 
gijovarghese profile image
Gijo Varghese

Why the title "The AWS Trap for Startups". It will happen if you're in any other cloud right?

Collapse
 
lucpattyn profile image
Mukit, Ataul • Edited

It will happen in any other cloud, but the point is, AWS kind of promotes throwing hardware at which can be solved with Software. So, that's why it's the AWS trap :)

Collapse
 
gijovarghese profile image
Gijo Varghese

My first thought reading the headline was something is wrong with AWS for startup. After reading the article, AWS has nothing to do with this. I'll never upgrade my hardware if I see spikes like this. It's purely caused by software, and don't blame AWS

Thread Thread
 
lucpattyn profile image
Mukit, Ataul

Since it's easy to upgrade RAM and CPU in AWS (at-least that's what gets promoted intentionally or un-intentionally all over internet), the moment we see CPU or RAM usage increasing, there is an inclination towards vertical scale up although the problem may be elsewhere. It's the inexperience to blame, not AWS.

Thread Thread
 
gijovarghese profile image
Gijo Varghese

Yes, even Google Cloud will recommend you to increase server specs at this condition.

Collapse
 
wulfmann profile image
Joseph Snell

Yes, avoid the classic 'throwing hardware at a software problem'.

Collapse
 
lucpattyn profile image
Mukit, Ataul

Absolutely ! Beautifully put !

Collapse
 
lucpattyn profile image
Mukit, Ataul

Made me add this line to the article itself without taking permission :)

Collapse
 
svinci profile image
Sebastian G. Vinci

The clickbait title tho...

Collapse
 
lucpattyn profile image
Mukit, Ataul

Don’t think you read the part why I said so !