DEV Community

jayson kibet
jayson kibet

Posted on

SQL joins Explained

Introduction

SQL JOINs are one of those concepts that seem simple in theory but confusing in practice.At first,SQL joins really confused me and didn't make any sense because i had the data i needed but in different tables.Putting them together actually felt harder than I thought.

1.Inner join

This join returns only the rows that match in both tables that you want to join.
The syntax is:select column
from t1
inner join t2
on t1.column=t2.column;

2.left join

LEFT JOIN returns all rows from the left table, plus any matching rows from the right.When there's no match,the result will be recorded as NULL.The order of the table in this join matters
the syntax is:select column
from t1
left join t2
on t1.column=t2.column

3.Right join

This is actually the opposite of the left join.It selects all rows from the right and matches them on the left.

4.full outer join

A Full outer join returns all rows from both tables.when it doesn't match, the missing side shows NULL.

5.self join

This is simply a table joining itself.Often when rows reference other rows in the same table.

Note:Always remember to put the ON function after the join

Window functions and Group by function

I once knew group by is a very powerful tool and could summarize my data in seconds and get results quickly until i came across the window functions and suddenly everything became a bit difficult.They almost did the same job but in a different way.

1.Group by

This command is used to combine rows into groups and return one result as per a group.You can group your results either by gender(female or male) or by age or any way you want your data to be presented.

Note:Do not include a where clause together with a group by function perhaps you can use a having clause to be more specific with your group by clause.

Window functions.

Window functions allow you to perform calculations across a two or more rows without removing any rows

Note:a.PARTITION BY divides the data into groups using PARTITION BY and ORDER BY specifies the order of rows within each group using ORDER BY.

1.sum()

In simple terms,the sum command creates a running total.Each row adds its value to the total so far.

2.avg()

Calculates the average value within a window.

3.Row_number()

This asigns a sequential number to each row.Every row gets a unique value.

4.Rank() and Dense rank()

when two rows have the same value RANK() and DENSE_RANK() both gives the same rank but they differ in what comes next
RANK() - It skips numbers after a tie , handles ties eg.1,2,3,3,5
DENSE_RANK() - It does not skip.However,the next rank after two tied at 2 is 3 eg.1,2,3,3,4,5.It doesnt leaves gaps in ranking

Functions every beginner should know

As a data analyst,you should go beyond just analyzing data and mastering basic commands like where function,group by and order by and so on.To become extraordinary,you need some powerful functions to help you analyze data with ease.

1.Lag()

This lets you access data from the previous row without writing complex queries.It looks at the previous row's value.

2.Lead()

This is the opposite of the lag function.It looks at the next row's value

NOTE:The Lag and Lead is perfect for making comparison over time either students performance in two exams or employees salary.

3.NTILE

This divides rows into equal groups(buckets)
It's useful for percentages,quartiles and even splitting students into perfomance bands according to their exams results.

It gives priority to the 1st groups.

4.substring

It selects the number of characters in a certain column.It helps in data cleaning and formating.


It wll select only the first 3 letters.

5.Date_part

This functions extracts either the year,month or day from a date

6.UNION

Despite UNION not being a function,i feel every beginner should master it.
It combines results from two queries and it also removes duplicates as a bonus.

common mistakes to avoid

1.running a delete or update statement

Running the delete or update statement can wipe your table at a glance.Always specify the row you want to update or delete by introducing the where clause.

2.Having and where clause

The where clause filters rows before the grouping happens.
Having filters after the aggregations.
Also,where clause does not go together with group by function.

3.Null values

Null is not the same as 0(zero).
Ignoring them leads to empty results.Always use IS NULL or IS NOT NULL.Don't use the =(equal) sign

4.Joins

When you run a join query,always include the word ON.This is where you want to join your table with common columns.

conclusion

We all learn by making mistakes.Every wrong query,every error message and every unexpected result is part of the process.Fixing them is what sharpens your thinking.The goal isn’t to write perfect queries from the start but understanding the structure of the queries that give the right answer to the question.

Top comments (0)