loading...
Cover image for SQL aggregate functions and Mariah Carey

SQL aggregate functions and Mariah Carey

helenanders26 profile image Helen Anderson Updated on ・4 min read

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Aggregate functions bring your data to life.

They allow you to take a standard table and use a function or two to build metrics and answer questions about the data. If you are familiar with Pivot Tables or Excel Functions the terminology is similar and you'll be up and running in no time.

Introduction
COUNT all the things!
COUNT DISTINCT
GROUP BY
ORDER BY
GROUP BY multiple columns
HAVING
Types of Functions available


Introduction

In this example, I'm going to be using a subset of the Billboard Top 100 dataset, available on Kaggle. I grew up in the 90s so I've loaded the 90s rows into a table called music_charts in SQL Server.

First up, I'm going to look at what I'm dealing with:

select top 10 *
  from dbo.music_charts

Alt Text

The table has loaded in fine with the four columns I am looking for. But, If I am to answer any of my burning questions about 90s music and who was the top of the charts I'm going to need to use aggregate functions.


COUNT all the things

The first way to explore this dataset is to COUNT how large it is. This is similar to the COUNT function in Excel and works in the same way.

This function counts up all the rows and returns, in this case, 1000 to the results window. 100 rows per year over 10 years. It's all there and ready to go.

select count(*) as table_count
  from dbo.music_charts

COUNT DISTINCT

Next up, I want to see how many artists are in the table. I can see that there are duplicates where an artist reached the top of the charts more than once.

We can do that by combining COUNT with DISTINCT:

select count(distinct artist) as artist_count
  from dbo.music_charts

This has returned a COUNT of 545 DISTINCT artists from the table. It still isn't super useful so we need to add more columns.


Filtering with GROUP BY

It's nice to have some facts and figures, but it would be more useful to answer some questions like 'which artist featured the most across the top 100 charts in the 1990s?'

Similar to a Pivot Table in Excel, we can do this by adding more columns.

However, it's not quite as simple as just dropping in a column. If we run this query we get an error:

select 
  artist, 
  count(artist) as artist_count
from dbo.music_charts

/* Msg 8120, Level 16, State 1, Line 1
Column 'dbo.music_charts.artist' is invalid in the select list 
because it is not contained in either an aggregate function 
or the GROUP BY clause. */

We need to explicitly say that we want all the artists to be collapsed down into one row with the count in the artist_count column next door. We do this using GROUP BY.

Alt Text

select 
  artist, 
  count(artist) as artist_count
from dbo.music_charts
group by artist

Alt Text


ORDER BY

This is getting close but needs one more clause to answer my question.

select top 5
  artist, 
  count(artist) as artist_count
from dbo.music_charts
group by artist
order by artist_count desc, artist

By ordering by the artist_count we can see that Mariah Carey appears the most in the dataset. Naturally.

Alt Text


GROUP BY multiple columns

Now we have the hang of the syntax we can start drilling into the data further and answering questions like 'Which Mariah Carey songs featured more often in the Top 100 charts in the 90s?'

select
  artist, 
  song,
  count(artist) as artist_count
from dbo.charts
where artist = 'mariah carey'
group by artist, song
order by artist_count desc, song, artist

Alt Text


HAVING

The last thing to cover off is HAVING. This allows us to put a condition on the aggregate. This is different from using a WHERE that puts a condition on individual rows.

To better illustrate how this works let's use an example where we want to see who has featured on the list more than five times.

We count up the number of artists and then use HAVING as the last step to filter on the count.

select
  artist, 
  count(artist) as artist_count
from dbo.music_charts
group by artist
having count(*) > 5
order by artist_count desc

Alt Text


Types of Functions available

But it doesn't stop with counting things.

The Basic Aggregate Functions used day to day in SQL Server are:

  • COUNT,
  • MIN,
  • MAX,
  • SUM,
  • AVG

So you can then slice and dice the 90s however you like.


Fun with functions doesn't stop here though. This post is the primer for a future post on Advanced Window Functions, coming soon!


Read more


This post originally appeared on helenanderson.co.nz

Data Analysis (14 Part Series)

1) Speed up your queries with indexes 2) Resources for beginner data analysts 3 ... 12 3) SQL concepts from A to Z 4) 6 ways to search with System Tables 5) Pearl Jam, Nirvana and Beautiful Data Visualisations 6) Entity Relationship Diagrams explained by Sonic the Hedgehog 7) How to use the VLOOKUP Excel function 8) SQL Joins without the Venn diagrams 9) SQL aggregate functions and Mariah Carey 10) Why I use SQL CTEs 11) How to customise SQL Server Management Studio (SSMS) 12) Have you VACUUMed your tables lately? 13) Why you need SQL window functions (part 1) 14) Why you need SQL window functions (part 2)

Posted on by:

helenanders26 profile

Helen Anderson

@helenanders26

Making applications go faster at Raygun, AWS Data Hero, and tag moderator on Dev.to. Database concept you don’t understand? Let me know, I’ll write a post!

Discussion

markdown guide
 

The last thing to cover off is HAVING. This allows us to put a condition on the aggregate. This is different from using a WHERE that puts a condition on individual rows.

Great explanation of having vs. where. All this time my explanation was having goes after a group by and that is just how it is. This makes much more sense.

 

@helenanders26 how do you practice sql? Do you run against an existing database? Do you standup a toy database, maybe on your laptop? Is sqlite a viable option for practice? Is there a good online service for practicing SQL?

 

There are lots of options around. The ones I like and use regularly are:

  • For Postgres, zchtodd's SQL challenges are ideal for going beyond the basics with some real world data.
  • For SQL Server the free download and AdventureWorks DB is great for seeing a real working DB in action.
  • For in-browser practice W3 schools has a great option.
  • For a guided course in the browser, Alan Hylands created SQLCrashCourse recently. We use Postgres DBs at my workplace so this is where I point new analysts to for some upskilling.
  • For those who have come from a different DB I point them to SQLLines who have a conversion tool for all those who need a quick way to switch syntax.