Ikhwan A Latif

Posted on Sep 29

Quick To Spot API/System Slow

#performance #devjournal #database #javascript

I recently read Blink — the book about how experts can spot mistakes in seconds (sometimes milliseconds). Developers build the same intuition: after getting burned enough times, you just feel when code will be slow.

I keep getting hit by the same types of slow code, so here are the patterns I now spot instantly and how to fix them.

The N+1 Query

aah yes, the most thing i see when developing is this problem. This cause your app to slowdown significantly, why is that, you call a query inside a loop think about it you have one user and thousand of product. Each product have detail somewhere in the table, you loop the product each loop you query the detail product. For me this kind of situation is not a problem if user is only have like 5 product or list, but 1000 product you better ditch the code and start a new. The code i usually counter are like this:

const products = await Product.getAll(userId); // fetch all product

for let i = 0; i < products.length(); i++ {
    const product = products[i];
    const productDetail = await ProductDetail.getOne(product.id)
    // ...other logic
}

const products = await Product.getAll(userId); // fetch all product

products.map(async product => {
    const productDetail = await ProductDetail.getOne(product.id)
    // ...other logic
})

be careful async return promise if you put .map(async ()=> {}) it return array of promise you should wrap it in Promise.all()

This both code spark problem async inside loop which is fine BUT just like i said before 5 - 15 product it's totally fine but if there 1000 product eeh. For me i usually tackle this situation like this code below:

const products = await Product.getAll(userId); // fetch all product
const allProductIds = products.map(x => x.id)
const allProductDetails = await ProductDetail.getAllBasedOnId(allProductIds)

for let i = 0; i < products.length(); i++ {
    const product = products[i];
    const productDetails = allProductDetails.include(product.id)
}

Well you can do above like that or you can convert the allProductDetails into Hash Table (or Object/Map if you want to said it) instead of array like

const products = await Product.getAll(userId);
const ids = products.map(p => p.id);
const allDetails = await ProductDetail.getAllByIds(ids);

// fast lookup
const detailsMap = new Map(allDetails.map(d => [d.product_id, d]));

for (const product of products) {
  const detail = detailsMap.get(product.id);
  // ...other logic
}

There are multiple way to handle it, for example you can manipulate it with full query the rest are logic.

const allProductDetails = await ProductDetail.getAll({
    join: {
         relation: Product,
         key: 'id',
         field: [],
         join: {
              relation: User,
              key: 'id',
              field: [],
              where: 'user = ' + userId,
         }
    },
})

for let i = 0; i < allProductDetails.length(); i++ {
    const productDetail = allProductDetails[i];
}

Note: the .getAll() is just an example

With code above, this save a lot of time because we preventing calling DB every single loop. Calling the database repeatedly is like driving across town for groceries but buying one item at a time instead of everything in one trip.

Index, Double Edge Sword

One of the most common places where performance tanks is when you start joining big tables. At first the query looks innocent:


SELECT * FROM product_history AS ph 
INNER JOIN product AS p ON ph.product_id = p.id

for small data it's ok. but if the record is reaching 1M or even 500K, than you might consider add Index. And there are two way to add index one using FOREIGN KEY and Other with CREATE INDEX.

Adding Index With Foreign Key

# Add Foreign Key
ALTER TABLE product_history ADD FOREIGN KEY(product_id) REFERENCES product(id) ON DELETE [CASCADE | NO ACTION | RESTRICT | SET NULL] ON UPDATE [CASCADE | NO ACTION | RESTRICT | SET NULL]

ALTER/ADD FOREIGN KEY semantics vary by engine (SQLite often requires table rebuild). Check your DB docs.

Above example are adding Foreign Key into your table, and what does Foreign Key do, well it just for data integrity between two table that's all, it mean making sure data for product_history.product_id is exists in product.id. And if there some change in parent tables the connected tables will take action. It either restrict you from changing or turn to null or delete it when there's some changes.

But here the bonus: in MySQL (or most relational database), when you declare foreign key, the database it will require you to create index or it will silently create you an index. So that means when you join two table later the database doesn't have to scan row by row, it can use the index to jump straight to the matching rows.

Adding an Index Directly

You can also add an index explicitly:


CREATE INDEX idx_product_history_product_id ON product_history(product_id [ASC | DESC])

This query above will create index with index named idx_product_history_product_id and index is located in product_history at column product_id and it sorted ASC / DESC depend which one you choose.

Let me tell you again if we only have like 100 to 10K record the query does not have a problem. But if you have like 500K or 1M data this simple query will became a bottleneck. And of course be careful when add index, adding too much index causing Writing Performance became slow. So add index when it's necessary.

And index are great when you try to search something in DB like:


SELECT * FROM product_history AS ph 
INNER JOIN product AS p ON ph.product_id = p.id WHERE p.code = 'ADF'

you can index for p.code like:


CREATE INDEX idx_product_code ON product(code [ASC | DESC])

Indexes aren’t limited to foreign keys, you can (and should) use them for filtering conditions in your WHERE clauses too. And of course you should consult the documentation when you try to add index the optimize way.

At the end of the day, indexes are your best friend for speeding up your queries, but like any other sharp tool, misuse the tools it can cut you. Use EXPLAIN, understand your database’s planner, and add indexes only where they matter.

Few Docs for quick read:

How Mysql Use Index

Create Index

Multiple Column Index

Note: if you have 1M or 500K Record and you add new index, the database will take it's time to complete the operation. Because the nature behind index are they create a separate structure(B-tree) where stored differently and not visible for dev. During that operation, your writes may be blocked or slowed down. So earlier developer panic when they do this (always have other pair of eyes if you play with prod DB).

Use EXPLAIN to Verify

Use EXPLAIN to see whether your query uses an index. But be wary, each relational database (MySQL, PostgreSQL, SQLite, etc.) can have a very different query planner (SQL optimizer).

The planner is the brain for the database, it will decides whether to use an index, a sequential scan, or another strategy to get the fastest result. That's why when you try on different Database with the same query it can look very different depending on the database:

MySQL often prefers indexes aggressively. You’ll see things like eq_ref or ref in the EXPLAIN output when the index is used.
PostgreSQL is cost-based. It means if scanning a whole table is cheaper it will happily choose full scan table, even index is exists in tables.
SQLite also has a planner, but it works closer to Postgres, if your query isn’t selective, it may ignore the index.

With Index Mysql — Using Mysql With Index

from image above you can see that the table ph is not using index type: ALL at first picture and second picture is using index type: eq_ref or ref

here some example in PostgreSQL and SQLite

Postgres Index — Even Index Exists it still chose Seq Scan (Full Scan)

Even though an index exists in Postgres, it may still pick a sequential scan if the estimates it’s cheaper than using the index (cost-based optimizer.

SQLite Index — It Use RowId instead of index

In SQLite, every table has a hidden rowid unless created table with WITHOUT ROWID. If you declare INTEGER PRIMARY KEY in your column, the column becomes an alias for rowid. That’s why SQLite EXPLAIN often shows SEARCH ... USING INTEGER PRIMARY KEY (rowid=?). For more info about rowid here

So if you see different EXPLAIN outputs across databases, it’s just the optimizer making trade-offs depending on its internal cost model.

you can refer to their documentation for more clarification.

SELECT, Death by a Thousand Bytes

Let's continue with a same example:


SELECT * FROM product_history AS ph 
INNER JOIN product AS p ON ph.product_id = p.id

did you see something wrong?, the asterisk symbol (*) is wrong, you should never use it. What it trying to do is fetch all column. Imagine you have 20 column and 10 column data type are TEXT or MEDIUMTEXT even even worse JSON. ugh... the pain.

for reference:

TEXT max are ~65 KB (65,535 Bytes)
MEDIUMTEXT max are ~16MB (16,777,215 Bytes)
JSON it depends (each database is very different, on how they implemented it)

Storage Requirement MySQL here

Now let’s play with numbers:

Say you fetch 100,000 rows, each with a TEXT column.
Worst case, that’s 100,000 × 65,535 ≈ 6.5 GB of raw data.
And of course, the databases apply compression and optimizations, but you get the idea, the memory footprint can blow up so fast.

The SQL will use a lot of memory just to process the SELECT query, if you combine with previous problem the missing index, no wonder the fetch it slow.

First of all when you try to create table, please consider what kind of type you want to use don't just put a large data type in there, if it's UUID just put like CHAR(36) or something close, just don't put CHAR(200) in there just because you want to scale it for the future. As a software developer you have to solve the current facing problem, for now think about how you can solve those quickly.

how to fix it, well call the column properly:


SELECT ph.product_id, p.product_name, ph.activity 
FROM product_history AS ph 
INNER JOIN product AS p ON ph.product_id = p.id

It convenient to type asterisk (*), but don't turn that into normal behavior, Just use necessary column.

Loops The good and The bad (JS)

The most important piece of code in every language. You can't avoid it, if you ever learn Time Complexity it always involve loop and total of data.

But before i explain more further, if your not a developer that bound your soul to JavaScript, you can skip this part.

Loop, in javascript there 3 types of loop and that is:

for

My favorite kinda of loop, a simple straightforward and no surprise it will do what you tell

// i can do this
for (let i = 0; i < 10; i++) {}

// i can do this too
for (let i = 10; i >= 0; i--) {}

// even this
for (let i = 0; i < 10; i*=2) {}

// heck even this
for (let i = Math.E; i < Math.pow(Math.PI, 20); i += Math.pow(i, 2)) {}

Endless possibility, the only thing that hold back is your imagination. Straight forward. This classic loop leaving little memory footprint so it ease the pain for the garbage collector in JavaScript. And of course since it close to bare metal or CPU, the engine will optimize it.

for (const item of items)


for (let i = 0; i < arr.length; i++) { /* ... */ }

Not my favorite loop, not hating it either

It used Iterator protocol under the hood (it something like calling method next() that return {value, done} if done true the loop stop).
JavaScript won't optimize it since you create a different kind of loop

Still, for moderate data sets it’s perfectly fine and often more readable.

for more information about Iterator here

arr.forEach(callback)

Hated it. Why? God damn slow. If you have a lots of records to loop, this will slow you you APIs, especially when the loop is nested (sometime we can't avoid the nested loop). And many people using it (because it's clean), i can't even stop it using break. You have throw an error exception to break it. And why i hate it:

It allocates a callback function for every iteration. More functions = more garbage for the collector.
You can't stop the loop with break, you have throw an error Exception to stop.
Nesting forEach calls is a quick way to create GC (Garbage Collector) hell.
Optimizer probably give up seeing this forEach.

forEach is fine for examples, but in performance-sensitive code you should change it.

Conclusion

You don't know how many time i get hit by that above problem, whenever i get a complain from client that the feature report is slow. It always those four problem. Well there is the time where i had to find a creative way to solve the problem, but if you get hit by performance problem always look for these four first. It save you a lot of time, here the step by step (by me) to handle performance problem:

Look at how much we play
Look at how much table you join
Look at the code, does it have N+1 Problem?
Look at the Select Query
Look at the Index
Look at Query Performance Report using EXPLAIN
Then you can change the code to different logic

As i was wrote above, this is just my way to measure / benchmark. But of course the cause the app performance became slow is how you access the data, how much data you play with, and how do you store those data. It always bound to CPU, RAM or Memory, and DISK.

And as always Measure first before fixing the problem.

And if you built a performance sensitive app, you shouldn't use JavaScript, just talking from experience here.

DEV Community