In our office we have a TV with a performance dashboard, so everyone can see possible problems in the system at any time. Amongst other things we show the TTFB on this dashboard. Those two spikes are not what you want to see, when you are just about to leave the office for lunch.
We started looking around what’s going on and found out the following in that order:
- the number of workers on the app-server rose dramatically
- but the CPU consumption did not
- the number of connections to the database reached the upper limit
- but the CPU consumption did not rise (in fact it decreased)
- and finally we start to see the following error messages in our monitoring tools:
Deadlock found when trying to get lock; try restarting transaction
- by that time the system was not responding anymore
- we checked the processlist from the database to see what it is doing
- and found loads of
REPLACE INTOstatements and several big
DELETEstatements (running since minutes) on the same table
It’s a cache lookup table. Everytime a page gets stored in our frontend cache, we do store some meta information about that cache entry in that database table, mostly dependencies to object data. And when that object data changes (for example the article stock), we fetch all cache entries depending on that object, flush them from the frontend cache and remove the information about it from the lookup table via a simple
New entries in that lookup table are not created via
INSERT, but via a
REPLACE INTO statement. This is because it could happen, that two requests hit the same page, both are a cache miss and both try to create an entry in that lookup table. Easy solution …
While digging deeper into that problem with our deadlocks, i found an article from Jervin Real from Percona about the InnoDB gap lock where he states the following:
For a non-
INSERTwrite operation where the
WHEREclause does not match any row, I expected there should’ve been no locks to be held by the transaction, but I was wrong
Until the moment i read that, i expected the same. But when you read that article from Jervin and the aging bug report in the MySQL bugzilla about the gap lock, you will understand that this makes perfect sense and you start to think about a
REPLACE INTO statement being the right thing in this situation.
We just replaced that
REPLACE INTO statement with a
INSERT ON DUPLICATE KEY UPDATE statement, which does exactly this. If the primary key is found, it updates that record, if not, the record gets inserted. This solved the problems with the deadlocks in that case.
You think that
REPLACE INTO works exactly that way? Keep on reading.
Do you know what
REPLACE INTO really does? Let’s see:
CREATE TABLE a ( id INT(1) UNSIGNED NOT NULL PRIMARY KEY, data VARCHAR(255) NULL ); CREATE TABLE b ( id INT(1) UNSIGNED NOT NULL PRIMARY KEY, fk INT(1) UNSIGNED NOT NULL, data VARCHAR(255) NULL, FOREIGN KEY (fk) REFERENCES a(id) ON DELETE CASCADE ); INSERT INTO a VALUES (1, 'test'); INSERT INTO b VALUES (1, 1, 'test'); REPLACE INTO a VALUES (1, 'data');
What do you think will the result of a
SELECT \* FROM b; look like?
Exactly, the table is empty. This is because the
REPLACE INTO first deletes the found record (which triggers the delete cascade) and then creates a new one. This might be expected behavior, depending on the situation, but to my understanding this would normally not be the intended behavior and at least this was new to me.
Single Responsibility Principle (or SRP) is one of the most important concepts in software development. The main idea of this concept is: all pieces of software must have only a single responsibility.