DEV Community

Alex Ruzenhack
Alex Ruzenhack

Posted on

Using C# LINQ API to perform SQL IN logic at WHERE clause

I was working on a project with User and Costumer identificação. The first is generated by the web app, and the second by the business department.

So I had to develop a logic to make those match between identifications.

The User entity needs to know the Costumer identification and, in order to make the match, first I selected the users without a costumer and extracted a list of emails. Then, I selected the customers based on this email list. This statement can be transcribed to SQL syntax, like this:

SELECT *
FROM Customers
WHERE email IN (SELECT email FROM Users WHERE CustomerId IS NULL)
Enter fullscreen mode Exit fullscreen mode

But how to translate the SQL to C# LINQ API?

  1. First, make the selection from Users, extracting the list of emails
  2. Select the Customers with those emails

This way:

var emails = _userManager.Users
    .Where(user => user.CustomerId == null)
    .Select(user => user.Email) // extract the emails from users
    .ToList();

var customers = _applicationRepository.GetCustomers()
    .Where(customer => emails.Contains(customer.Email)) // the Contains method carry the IN logic when translated to SQL script
    .ToList();
Enter fullscreen mode Exit fullscreen mode

Now we have the match programmatically.

That's it!

Reference

Top comments (9)

Collapse
 
dance2die profile image
Sung M. Kim

Thank Alex,
the translation looked easy to understand as the LINQ version matches with SQL statement 👍.


And I have a question regarding the conversion.

Would the LINQ version fetch all data from database first and filter (using Where clause) in memory?

or

would the query create an optimal syntax to send to the database and fetch only filtered records?

Collapse
 
courier10pt profile image
Bob van Hoove • Edited

Calling ToList() results in execution of the query. So the first statement will make an in memory list of email addresses.

To get a better feel for what's happening you can intercept the actual queries. Assuming Entity framework is used: Log Queries executed by Entity Framework DbContext.

Collapse
 
dmfay profile image
Dian Fay • Edited

I stopped writing C# right as LINQ got popular so I don't know it too well but I believe this does store the list of emails in memory.

In SQL you could do this:

SELECT c.*
FROM customers AS c
JOIN users AS u ON u.email = c.email
WHERE u.customer_id IS NULL

and achieve the same result without a subquery. I don't know how you'd go about translating that to LINQ though.

Collapse
 
rafalpienkowski profile image
Rafal Pienkowski

In my opinion, this is a better approach, which doesn't require an additional call to the database and doesn't require to store entities in the memory (which could have terrible results in case of a huge collection).
And of course, it is available in LINQ.

Collapse
 
alexruzenhack profile image
Alex Ruzenhack • Edited

It's true! But the tables are in different contexts, which means they are in different databases. So, it's not possible to make a JOIN between them.

Thread Thread
 
dmfay profile image
Dian Fay

My condolences :)

Collapse
 
alexruzenhack profile image
Alex Ruzenhack

Here is the EF SQL of "Users":

Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (110ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [user].[Id], [user].[Email], [user].[CustomerId], [user].[UserName]
FROM [AspNetUsers] AS [user]
WHERE [user].[CustomerId] IS NULL

And here is the EF SQL of "Customers":

Microsoft.EntityFrameworkCore.Database.Command:Information: Executed DbCommand (16ms) [Parameters=[], CommandType='Text', CommandTimeout='30']
SELECT [t].[Id], [t].[Email], [t].[FullName], [t].[RegisteredAt],  [t].[CreatedAt], [t].[UpdatedAt]
FROM [Customers] AS [t]

As we can see, the IN logic really occurs in memory.

Collapse
 
dance2die profile image
Sung M. Kim

Thanks for the clarification Alex 😎

Collapse
 
jfrankcarr profile image
Frank Carr

I prefer to use stored procedures and table variables for this kind of query when it gets complex or is working against a large amount of data. I prefer to let the database do the job its been optimized for, parsing and sorting data, and use C# and LINQ for stuff SQL isn't optimized to do well, like string manipulation, or to work with smaller amounts of data, like lookup tables.

Of course, using techniques like in this article are good when there's a "turf war" between DBA's and application developers or when the application team has limited SQL skills and a DBA isn't available to write optimal procs.