SQL like syntax on all your data sources, DB, XML, and even objects
Follow me on Twitter, happy to take your suggestions on topics or improvements /Chris
What is Linq? Linq stands for Language Integrated Query. It is the name for a set of technologies based on the integration of query capabilities directly into the C# language. English? it means you have SQL like syntax on Lists, XML files, Databases, well any kind of queryable data. It is probably the most important thing you want to learn in C#.
TLDR; This article describes the usage of the library Linq that is part of the .NET framework. Linq enables you to query all sorts of data sources with a SQL like syntax. IMO it's the most powerful feature to ever hit the .NET framework.
If you are not using Linq when coding .NET - you are missing out.
References
More on joins in Linq
There is more than one way to do joins, this should cover most scenarios.Linq by example site
Great site to quickly refer to the syntax for different types of operators.Operators - overview page
Good docs page that intros you to operators, lots of good sub pages for further navigation.YAML for Linq
Yes, you can even do YAML with Linq.Linq pad
An invaluable tool for trying out Linq expressionsLinq for XML
Good docs page for doing Linq with XML.
The Problem
Ok so Linq is supposedly changing everything. How exactly?
Today you are most likely doing things such as grouping, sorting, transforming your data from one Poco, one Plain Old C# Object, to another.
Yea, I know, not the most fun work, but it needs to be done. What about it?
You said it was boring. I can tell you why it's boring, it's not declarative and you use utility methods like Array.sort
to get the job done. This leads to you having to type way more than you need to.
That's what programming is I thought, a patchwork of different approaches and libraries that spits out the correct data at the other end.
It doesn't have to be you know. You can have a high-level approach, that's declarative, that works for all data source types, or well most of them.
There's that word again declarative and tell me about the data sources, I'm intrigued?
Declarative is more about describing what you want, without saying exactly how to achieve it. Like SQL for example:
SELECT column, column1
FROM table
WHERE condition
ORDER BY column
Ok so you are saying I should be using something high-level, SQL like to transform, group, sort, etc. Instead of my horrible mix of utility functions and for-loops, I'm using today?
Yes, exactly that. :)
What about the other data sources you talked about?
Linq is already implemented to work with an ORM like Entity Framework, XML and just plain code. So you just define a Linq query in pretty much the same way regardless of the data source. Isn't that neat?
Yea for sure. I will need to see this to believe it though.
Sure thing.
One more question, does it work for YAML?
Yes it does, albeit with a 3rd party library like
https://github.com/aaubry/YamlDotNet
To sum up, just asking for the data we need is possibly error prone and we need to learn a new approach to do it for each data type. Linq solves all that and yes you have tooling support, intellisense.
Sweet, I was waiting for you to say intellisense, teach me Obi Wan
Patience my young Padawan. :)
What
So what does Linq give us?
First class construct,
With LINQ, a query is a first-class language construct, just like classes, methods, events.Language keywords and familiar operators,
You write queries against strongly typed collections of objects by using language keywords and familiar operators.Consistent query experience, The LINQ family of technologies provides a consistent query experience for objects (LINQ to Objects), relational databases (LINQ to SQL), and XML (LINQ to XML).
Same query language regardless of the type of source, now THAT is powerful :)
Yes sounds amazing.
It is.
What can I use it on?
Everything that supports IEnumerable
or IEnumerable<T>
.
That all sounds good but how does it look?
Well, the first time I saw Linq queries I thought, Sweet, I got SQL in my source code. I know SQL so that's half the battle :)
Can you show me?
Sure. Here's an example:
IEnumerable<Character> rows =
from row in characters
where row.name = "Gandalf"
select row;
You're right, that's totally SQL.
SELECT
,FROM
,WHERE
:) It does look backward though, the table first I mean?
You are correct, and this is on purpose, it's so we are context-aware and have IntelliSense/tooling support early on.
Oh, so by defining the data source first, I know what columns/properties are on it?
Yes, exactly.
Learning Linq
You may think you know Linq just because of the above example but there are some concepts that we need to know to understand how it operates. We should think of Linq as having a three step process:
- Get the data source, this data source could be anything, a list, an XML file, a table in a Database.
-
Create the query, query represents the intent, something that you want to do. It's important to understand that you are NOT getting the data at this point but something of type
IQueryable
, the question, not the actual data. - Execute the query, this is when you actually get the data.
Get the data source
As I stated before this could look in any number of ways, depending on the data source, like the below:
// objects
var scores = new List() { 7, 5, 4, 3, 8, 5 };
// XML
XElement characters = XElement.Load(@"c:\characters.xml");
// DB using Entity Framework
using (var db = new MovieEntities())
{
var movies = db.Movies
}
Create the query
Here we are essentially defining a query in the following way:
IQueryable<type> =
FROM row in <data source>
WHERE <condition>
SELECT <columns from row>
Remember that the above only represent an intent of what we want, we are not actually executing the query.
Execute the query
So when do we actually execute the query? It's when we access it, for example start looping through it like this:
var scores = new List() { 7, 5, 4, 3, 8, 5 };
var result =
from score in scores
where score > 5
select score;
foreach(var r in result) {
Console.WriteLine("Score: {0}", r);
}
Force execution
We can make the query happen right away by using certain operators. Example of these operators are Count()
, Max()
, Average()
and First()
. We can also be using ToList()
or ToArray()
, this forces the data to be fetched and turned into a List
or Array
type.
Here's an example:
var scores = new List() { 7, 5, 4, 3, 8, 5 };
var result = from score in scores
where score > 5
select score;
var listResult = result.ToList();
Why does this matter? I've found it matters when you are dealing with for example databases. It's a quite common mistake that you think you've executed the query and you have really only created the query. Then when you start iterating through your collection you initiate a call to your database and that can be really bad for performance.
Next, let's go through a set of demos so you hopefully feel like you understand enough Linq to use it in your service layer as well as towards databases and other data sources.
There are two versions to create queries
I started this article comparing Linq to SQL and that works if you got a background working with databases. But there is a different way to create queries namely by using method names. We call the SQL like style Query Syntax style
and the more method driven style Selector style
. Let's show an example of both styles so you can compare and see what works for you.
// Query Syntax style
from c in characters
where c.name = "Gandalf"
select c;
VS
// Selector style
characters
.Where(c => c.name = "Gandalf")
.Select(c => c.name)
It's very similar, yet different. I clearly prefer the SQL like way of doing it but you do you :)
Simple operators
In this demo we will learn the following:
- Filter and select columns from a data source
- Transform the output the response
Filter and Select
Using filtering is as simple as using the Where
construct.
from c in characters
where c.Name == "Gandalf"
select c.Name;
We see above how we filter out so we only get the record matching a specific condition. For selection though it becomes different. If we are used to SQL, it's tempting to think we can just comma separate between columns, if we need more than one c.Name, c.Age
. That doesn't work. How you would do that in Linq is to instead create an anonymous object like so:
from c in characters
where c.Name == "Gandalf"
select new { Name = c.Name, Age = c.Age };
or use a typed object like so:
from c in characters
where c.Name == "Gandalf"
select new Person(){ Name = c.Name, Age = c.Age };
Complex operators
If we have the mental model of operating on the data like it was data in Database there are a few things that should be possible to do like:
- Ordering, we should be able to sort our data in Ascending and Descending order
- Grouping, we should be able to group our data based on different keys
- Joining, it should be possible to join different data sources together based on a common key
- Hierarchical queries, this is really a special case of joining the data together but depending on the data is given to you it's either different data sources that we need to stitch together or it is one object graph that we need to traverse.
Let's work through each of the scenarios above:
Ordering
So for ordering, we need to use the orderby
construct, like so:
var characters = new List<Character>(){
new Character()
{
Name = "Bilbo",
Age= 90,
Type="Hobbit"
},
new Character(){
Name = "Ent number 1",
Age= 1500,
Type="Ent"
},
new Character()
{
Name = "Gandalf",
Age= 770,
Type="Maya"
}
};
var orderedResult =
from ch in characters
orderby ch.Age
select new { Name = ch.Name, Age = ch.Age };
foreach(var ch in orderedResult)
{
Console.WriteLine("Ordered , Name: {0}, Age: {1}", ch.Name, ch.Age);
}
Note, how the orderby
statement happens before the select
if you compare it to SQL.
This produces the following result:
Ordered , Name: Bilbo, Age: 90
Ordered , Name: Gandalf, Age: 770
Ordered , Name: Ent number 1, Age: 1500
What about ascending/descending order? That's simple we just add the keyword descending
or ascending
after our orderby
statement like so:
var orderedResult =
from ch in characters
orderby ch.Age descending
select new { Name = ch.Name, Age = ch.Age };
// alt syntax
var oo =
characters
.OrderByDescending(c => c.Age)
.Select(c => new { Name = c.Name, Age = c.Age });
Grouping
Grouping is something we need to do quite often. There are a few things to grouping that we should consider:
- What to group, this is your call but imagine having a record of students, what could be interesting to group on. All students from the same place, maybe enrollment year or something else.
- Calculations/Aggregations, when you produce a group you usually want to do something with it like maybe count the number of rows or do some other operations like calculating the average
- How to read from a grouping, we will learn how to iterate over the group that is produced from our grouping statement.
Let's have a look at a couple of examples where we address the above. First out let's look at simple grouping:
Simple grouping
var characters = new List<Character>(){
new Character()
{
Name = "Bilbo",
Age= 290,
Type="Hobbit"
},
new Character()
{
Name = "Frodo",
Age = 70,
Type = "Hobbit"
},
new Character()
{
Name = "Gandalf",
Age= 770,
Type="Maya"
}
};
var result =
from c in characters
group c by c.Type into gr
select new { Characters = gr, Race = gr.Key };
// alt syntax
var groupedResult =
characters
.GroupBy( c => c.Type)
.Select(gr => new { Characters = gr, Race = gr.Key });
foreach (var obj in result)
{
Console.WriteLine("Key: --{0}--", obj.Race);
foreach (var c in obj.Characters)
{
Console.WriteLine("Name: {0}", c.Name);
}
}
Let's zoom in on the code doing the grouping:
from c in characters
group c by c.Type into gr
select new { Characters = gr, Race = gr.Key };
We see above how we group on the property Type
. Additionally, we see how we produce a result consisting of the group gr
and the key we group on gr.Key
, i.e Type
.
Finally, we then iterate over the result we need to do so with a nested for-loop as we first get a list of groups and inside of every group we have a number of entries, like so:
foreach (var obj in result)
{
Console.WriteLine("Key: --{0}--", obj.Race);
foreach (var c in obj.Characters)
{
Console.WriteLine("Name: {0}", c.Name);
}
}
Let's look at a second example but this time we will do a calculation, in this computing the total age for a group. Given the same input data we produce the following statement:
var totalAgeResult =
from c in characters
group c by c.Type into gr
select new
{
Type = gr.Key,
TotalAge = gr.Sum(c => c.Age)
};
// alt syntax
var totalAgeResult =
characters
.GroupBy(c => c.Type)
.Select(gr => new
{
Type = gr.Key,
TotalAge = gr.Sum(c => c.Age)
});
Console.WriteLine("Total age by Race");
foreach (var obj in totalAgeResult)
{
Console.WriteLine("Race: {0}", obj.Type);
Console.WriteLine("TotalAge: {0}", obj.TotalAge);
}
Let's zoom in especially on the select
where we produce a record consisting of the grouping key and our computation:
select new
{
Type = gr.Key,
TotalAge = gr.Sum(c => c.Age)
}
Joining
Joining data is something we are used to from working with Databases. In fact we have many different joins for different purposes. Let's talk about sum of them:
- inner join, this join takes all the records that match a certain condition and exclude any matching records
- left join, this join does an attempt of stitching together records that may not match. Where it matches we get proper values and where it doesn't we get NULL. I've usually seen this one used when we want to produce for example a sales record of your inventory. Not all items for sale might actually have been sold but for the ones who have you want to know how many have. Usually, the output looks something like this:
--------- --------
ProductId Quantity
--------- --------
1 5
2 NULL
So how can we use Join with Linq? Here's the syntax for a simple join, but keep in mind there are many joins and this is a big topic, refer to the References section to see more joins:
// our data we are about to join
var movies = new List<Movie>()
{
new Movie(){
Id = 1,
Name="Episode IV - A new Hope",
ReleaseYear=1977
},
new Movie(){
Id = 2,
Name="Episode V - Empire Strikes back",
ReleaseYear=1980
},
new Movie(){
Id = 3,
Name="Episode VI - Return of the Jedi",
ReleaseYear=1983
}
};
var movieCharacters = new List<MovieCharacter>
{
new MovieCharacter(){ MovieId = 1, Name= "Luke Skywalker"},
new MovieCharacter(){ MovieId = 1, Name= "Princess Leia"}
};
var resultJoined =
from m in movies
join c in characters on m.Id equals c.MovieId
select new
{
Name = c.Name,
Movie = m.Name
};
var resultJoinedAlt =
movies
.Join(
characters, // join with collection
m => m.Id, // key from outer collection
c => c.MovieId, // key from inner collection
(m, c) => new { Name = c.Name, Movie = m.Name } // result selector
);
foreach(var r in resultJoined)
{
Console.WriteLine("Joined {0} {1}", r.Name, r.Movie);
}
Let's zoom in on our join
query:
var resultJoined =
from m in movies
join c in characters on m.Id equals c.MovieId
select new
{
Name = c.Name,
Movie = m.Name
};
We see that we, just like in SQL, start with one data source movies
and then we bind it to another data source characters
using join
and on
to state what columns belong together. In this case, we are saying that Id
from movies
is the same thing as MovieId
from characters
.
Producing a result like this:
Joined Luke Skywalker Episode IV - A new Hope
Joined Princess Leia Episode IV - A new Hope
But I thought Luke & Leia were in all the movies?
Ssh, don't ruin the demo ;)
Hierarchical data
Let's talk about hierarchical data. In the join example, we had two separate data sources movies
and characters
. Sometimes we have everything in one object graph like so:
var order = new Order()
{
Id = 1,
Created = DateTime.Now,
OrderItems = new List<OrderItem>()
{
new OrderItem()
{
Quantity = 1,
Price = 10,
Product = new Product()
{
Name = "Random ssschwaag"
}
},
new OrderItem()
{
Quantity = 3,
Price = 100,
Product = new Product()
{
Name = "Stickers pack"
}
}
}
};
The good news is that Linq can help us to ask for data over the entire graph using an operator referred to as SelectMany()
. When we use our SQL like syntax however it looks a lot like the join
statement. Let's show it:
var result = from o in orders
from item in o.OrderItems
select new {
OrderId = o.Id,
Sum = item.Quantity * item.Price,
Name = item.Product.Name
};
// alt syntax
var r = orders
.SelectMany(
o => o.OrderItems, // go at depth
(order, item) => new // result selector, containing outer and inner collection
{
OrderId = order.Id,
Sum = item.Quantity * item.Price,
Name = item.Product.Name
}
);
What we can see from the above is how we have two data sources orders
and o.OrderItems
. The first is the order
itself and the other data source is the nested OrderItems
on the order
. Then we create a projection consisting of data from both an order
, orderitem
and even the product
:
select new {
OrderId = o.Id,
Sum = item.Quantity * item.Price,
Name = item.Product.Name
}
foreach(var item in result)
{
Console.WriteLine(" Nested, Order: {0} Sum: {1} Name: {2}", item.OrderId, item.Sum, item.Name);
}
This produces the result:
Nested, Order: 1 Sum: 10 Name: Random ssschwaag
Nested, Order: 1 Sum: 300 Name: Stickers pack
As you can see this is quite powerful.
Summary
We've covered how the what and why of Linq and hopefully you saw the need for one unifying language regardless of the data source. Furthermore, we've covered different operators in Linq and how you can use them to group, transform and compute what you need. It's impossible to cover everything so I urge you to have a look at the references section of this article to dive deeper. Hopefully, your understanding is now good enough so you can study further on your own.
Top comments (1)
very well explained, it's quite helpful :)