I've built an application and, like any lazy dev out there, I focused on the business logic, the project structure, the readability, comments, the dependency injection, the unit tests, you know... the code. My preference is to start from top to bottom, so I create more and more detailed implementations of interfaces while going down to the metal. The bottom of this chain is the repository, that class which handles database access, and I've spent little to understand or optimize that code. I mean, it's DB access, you read or you write stuff, how difficult can it be?
When it was time to actually test it, the performance of the application was unexpectedly bad. I profiled it and I was getting reasonable percentages for different types of code, but it was all taking too long. And suddenly my colleague says "well, I tried a few things and now it works twice as fast". Excuse me?! You did WHAT?! I have been trying a few things too, and managed to do diddly squat! Give me that PR to see what you did! And... it was nothing I could see.
He didn't change the code, he just added or altered the attributes decorating the properties of models. That pissed me off, because I had previously gone to the generated SQL with the SQL Profiler and it was all OK. So I executed my code and his code and recorded the SQL that came out:
- was it the lazy loading? Nope. The number of instructions and their order was exactly the same
- was it the explicit declaration of the names of indexes and foreign keys? Nope. Removing those didn't affect performance.
- was it the
ChangeTracker.LazyLoadingEnabled=false
thing? Nope, I wasn't using child entities in a way that could be affected. - was there some other structure of the generated SQL? No. It was exactly the same SQL! Just my code was using thousands of CPU units and his was using none.
- was it magic? Probably, because it made no sense whatsoever! Except...
Entity Framework generates simple SQL queries, but it doesn't execute them as you and I would. It constructs a string, then uses sp_executesql
to run it. Something like this:
exec sp_executesql N'SELECT TOP(1) [p].[ID], [p].[TXT], [p].[LUP_TS]
FROM [sch].[table] AS [p]
WHERE [p].[ID] = @ __p_0',N'@__ p_0 nvarchar(64)',@__p_0='xxxx'
Do you see it? I didn't until I started to compare the same SQL in the two versions. And it was the type of the parameters! Note that the aptly named parameter @__p_0
is an NVARCHAR
. The actual column in the database was VARCHAR
! Meaning that the code above was unnecessarily always converting values in order to compare them. The waste of resources was staggering!
How do you declare the exact database type of your columns? Multiple ways. In my case there were three different problems:
- no
Unicode(false)
attribute on the string columns - meaning EF expected the columns to be NVARCHAR - no Typename parameter in the Column attribute where the columns were
NTEXT
- meaning EF expected them to beNVARCHAR(Max)
- I guess one could skip the Unicode thing and instead just specify the type name, but I haven't tested it
- using MaxLength instead of StringLength - because even if their descriptions are very similar and MaxLength sounds like applying in more cases, it's StringLength that EF wants.
From 40-50ms per processing loop, it dropped to 21ms just by fixing these.
Long story short: parametrized SQL executed with sp_executesql
hides a possible performance issue if the columns that you compare or extract have slightly different types than the one of the parameters.
Go figure. I hate Entity Framework!
Top comments (0)