Beverline Otiende

Posted on Mar 22 • Edited on Mar 26

Advanced SQL for Data Analytics: Advanced Techniques Every Data Analyst Should Know

#sql #database #datascience

In today’s data-driven world, many organizations heavily rely on data to help them make informed decisions, to optimize their operations and to help them gain a competitive advantage over their competitors.
At the heart of this data reliance and transformation lies SQL (Structured Query Language), which is the fundamental tool used to interact with relational databases.
SQL is the backbone of data analytics because most business data is usually stored in relational databases such as PostgreSQL, MySQL or Microsoft SQL Server.

Data analysts use SQL to:
•Retrieve data for reporting and dashboards
•Clean and transform raw data
•Perform aggregations and calculations
•Analyze trends and patterns
•Support decision-making processes

While basic SQL skills such as creating databases and tables, inserting data into tables, updating tables and deleting from tables are essential, they are not enough to handle the complexity of real world data problems.
Advanced SQL techniques are required to help data analysts to extract deeper insights, handle large datasets efficiently and solve complex business challenges.
These advanced SQL techniques go beyond simple queries and include concepts such as

Joins
Window functions
Common Table Expressions (CTEs)
Subqueries
Stored Procedures

In this article, I explore more about these advanced SQL techniques and how they are applied in real-world data analytics scenarios.
This article is designed to help beginners understand not just the syntax, but also the practical application of these techniques in business environments.

JOINS

Joins in SQL allow analysts to combine data from different tables into one result set based on a related column.

Let us use these two tables below (Products table and Purchases table) to understand more about the different types of SQL Joins.

Below is the products table:

products Table
| # | Product ID | Product Name | Price  |
| - | ---------- | ------------ | ------ |
| 1 | 1          | Laptop       | 55,000 |
| 2 | 2          | Phone        | 10,000 |
| 3 | 3          | Charger      | 2,000  |
| 4 | 4          | Mouse        | 500    |

Below is the purchases table:

purchases table
| # | sale_id  |product_id| quantity | total_purchase |       |
| - | --------| -------- | --------- | -------------- |
| 1 | 1       | 1        | 2         | 110,000        |
| 2 | 2       | 2        | 3         | 30,000         |
| 3 | 3       | 3        | 2         | 4,000          |
| 4 | 4       | [NULL]   | 1         | 3,500          |
| 5 | 5       | [NULL]   | 4         | 25,000         |
| 6 | 6       | [NULL]   | 5         | 70,000         |

Types of SQL Joins
a) INNER JOIN
Inner Join only returns rows that have matching values in both tables being joined.

E.g. we can check for rows with matching values from our products and purchases tables using the sql query below.

 SELECT * 
 FROM products p
 INNER JOIN purchases s
    on p.product_id = s.product_id;

From this INNER Join query above, you get:

| # | product_id | product_name | price  | sale_id | product_id | quantity | total_purchase |
| - | ---------- | ------------ | ------ | ------- | ---------- | -------- | -------------- |
| 1 | 1          | Laptop       | 55,000 | 1       | 1          | 2        | 110,000        |
| 2 | 2          | Phone        | 10,000 | 2       | 2          | 3        | 30,000         |
| 3 | 3          | Charger      | 2,000  | 3       | 3          | 2        | 4,000          |

b) LEFT JOIN (LEFT OUTER JOIN)
Left Joins returns all the rows from the left table (first table selected) and matching rows from the right table (second table).
If there is no match from the right table, it fills with NULL.

For example lets check for all the rows and columns from the products table and only the corresponding matching values from the purchases table. we perform a Left Join using the query below.

SELECT * 
FROM products p
LEFT JOIN purchases s
   on p.product_id = s.product_id;

From this Left Join we get;

| # | product_id | product_name | price  | sale_id | product_id | quantity | total_purchase |
| - | ---------- | ------------ | ------ | ------- | ---------- | -------- | -------------- |
| 1 | 1          | Laptop       | 55,000 | 1       | 1          | 2        | 110,000        |
| 2 | 2          | Phone        | 10,000 | 2       | 2          | 3        | 30,000         |
| 3 | 3          | Charger      | 2,000  | 3       | 3          | 2        | 4,000          |
| 4 | 4          | Mouse        | 500    | [NULL]  | [NULL]     | [NULL]   | [NULL]         |

c) RIGHT JOIN (RIGHT OUTER JOIN)
Right Joins returns all the rows from the right table (second table selected) and matching rows from the left table (first table).
If there is no match from the left table, it fills with NULL.

For example we can do a Right Join to get the corresponding rows from the products table and all the rows from the purchases table using the query below.

SELECT * 
FROM products p
RIGHT JOIN purchases s
   on p.product_id = s.product_id;

From this right join, we get;

| # | product_id | product_name | price  | sale_id | product_id | quantity | total_purchase |
| - | ---------- | ------------ | ------ | ------- | ---------- | -------- | -------------- |
| 1 | 1          | Laptop       | 55,000 | 1       | 1          | 2        | 110,000        |
| 2 | 2          | Phone        | 10,000 | 2       | 2          | 3        | 30,000         |
| 3 | 3          | Charger      | 2,000  | 3       | 3          | 2        | 4,000          |
| 4 | [NULL]     | [NULL]       | [NULL] | 4       | [NULL]     | 1        | 3,500          |
| 5 | [NULL]     | [NULL]       | [NULL] | 5       | [NULL]     | 4        | 25,000         |
| 6 | [NULL]     | [NULL]       | [NULL] | 6       | [NULL]     | 5        | 70,000         |

d) FULL JOIN (FULL OUTER JOIN)
Full Joins returns all rows from both tables. It is basically a combination of Left Join and Right Join.

For example lets do a Full Join to find all the rows from both the tables using the syntax below.

SELECT * 
FROM products p
FULL JOIN purchases s
   on p.product_id = s.product_id;

From this Full Join we get;

| # | product_id | product_name | price  | sale_id | product_id | quantity | total_purchase |
| - | ---------- | ------------ | ------ | ------- | ---------- | -------- | -------------- |
| 1 | 1          | Laptop       | 55,000 | 1       | 1          | 2        | 110,000        |
| 2 | 2          | Phone        | 10,000 | 2       | 2          | 3        | 30,000         |
| 3 | 3          | Charger      | 2,000  | 3       | 3          | 2        | 4,000          |
| 4 | [NULL]     | [NULL]       | [NULL] | 4       | [NULL]     | 1        | 3,500          |
| 5 | [NULL]     | [NULL]       | [NULL] | 5       | [NULL]     | 4        | 25,000         |
| 6 | [NULL]     | [NULL]       | [NULL] | 6       | [NULL]     | 5        | 70,000         |
| 7 | 4          | Mouse        | 500    | [NULL]  | [NULL]     | [NULL]   | [NULL]         |

e) CROSS JOIN
This returns every combination of rows from both tables and the number of rows multiply.

SELECT * 
FROM products p
CROSS JOIN purchases s;

f) SELF JOIN
Self Join is when you join a table to itself. You need to use aliases to refer to the same table in two roles

Lets explore the rest of the advanced sql functions using these 2 tables below.** clients table and sales table**
Here is the clients table

clients table
| # | client_id | client_name   | city         |
| - | --------- | ------------- | ------------ |
| 1 | 1         | Alice Johnson | New York     |
| 2 | 2         | Bob Smith     | Los Angeles  |
| 3 | 3         | Charlie Brown | Chicago      |
| 4 | 4         | Diana Prince  | Houston      |
| 5 | 5         | Ethan Hunt    | Phoenix      |
| 6 | 6         | Fiona Clark   | Philadelphia |

Below is the sales table

sales table
| # | sale_id | client_id | product_name | quantity | sale_amount |
| - | ------- | --------- | ------------ | -------- | ----------- |
| 1 | 1       | 1         | Laptop       | 3        | 75,000      |
| 2 | 2       | 2         | Phone        | 2        | 25,000      |
| 3 | 3       | 3         | Tablet       | 10       | 25,000      |
| 4 | 4       | 4         | Monitor      | 1        | 35,000      |
| 5 | 5       | 5         | Printer      | 7        | 35,000      |
| 6 | 6       | 6         | Keyboard     | 2        | 1,000       |
| 7 | 7       | [NULL]    | Laptop       | 3        | 85,000      |

WINDOW FUNCTIONS

Window functions in SQL allow you to perform calculations across a set of rows that are somehow related to the current row without grouping them into a single result.
You can use Window functions to rank rows, calculate cumulative totals or to find the difference between consecutive rows in a dataset.
Unlike like aggregate functions like SUM(), AVG(), COUNT(), MIN(), MAX() which return a single result for a group of rows, Window functions return a value for each row while still providing information from the related rows.

The basic syntax of Window function include;

PARTITION BY: This splits the data into groups.
ORDER BY: defines calculation order
OVER(): defines the window

Common window functions include:
a)ROW_NUMBER(): This assigns a unique number to each row in the table.
For example lets assign unique row numbers to our sales table and order by sale_amount in descending order. We use the sql query below;

 SELECT *, 
    ROW_NUMBER() OVER (ORDER BY sale_amount desc) AS row_number
FROM sales;

From this query, we get unique row numbers assigned to our table as shown below. Even if more than one rows have the same value, they get different row numbers.
We get the assigned row_numbers added as an extra column with numbers 1,2,3,4,5,6,7 as;

| sale_id | client_id | product_name | quantity | sale_amount | row_number |
| ------- | --------- | ------------ | -------- | ----------- | ---------- |
| 7       | [NULL]    | Laptop       | 3        | 85,000      | 1          |
| 1       | 1         | Laptop       | 3        | 75,000      | 2          |
| 4       | 4         | Monitor      | 1        | 35,000      | 3          |
| 5       | 5         | Printer      | 7        | 35,000      | 4          |
| 2       | 2         | Phone        | 2        | 25,000      | 5          |
| 3       | 3         | Tablet       | 10       | 25,000      | 6          |
| 6       | 6         | Keyboard     | 2        | 1,000       | 7          |

b)RANK(): This assigns ranks to each rows with ties getting the same rank but leaving gaps in subsequent ranks.
Lets use our sales table to rank our rows and order by sale_amount in descending order.
We use the sql query;

 SELECT *, 
    RANK() OVER (ORDER BY sale_amount desc) AS rank
FROM sales;

From this query we get a new column rank, and rows with the same sales amount get the same rank value and then there are gaps to subsequent ranks. i.e we get 1, 2, 3, 3, 5,5,7. there is a gap between 3 and 5 because we have two rank 3s. same as for the two rank 5s then rank 7.

| sale_id | client_id | product_name | quantity | sale_amount | rank  |
| ------- | --------- | ------------ | -------- | ----------- | ----- |
| 7       | [NULL]    | Laptop       | 3        | 85,000      | 1        |
| 1       | 1         | Laptop       | 3        | 75,000      | 2          |
| 4       | 4         | Monitor      | 1        | 35,000      | 3          |
| 5       | 5         | Printer      | 7        | 35,000      | 3          |
| 2       | 2         | Phone        | 2        | 25,000      | 5          |
| 3       | 3         | Tablet       | 10       | 25,000      | 5          |
| 6       | 6         | Keyboard     | 2        | 1,000       | 7          |

c)DENSE_RANK() : This assigns ranks to each rows with ties getting the same rank but does not leave gaps in subsequent ranks.
Lets use our sales table to dense_rank our rows and order by sale_amount in descending order.
We use the sql query;

 SELECT *, 
    DENSE_RANK() OVER (ORDER BY sale_amount desc) AS dense_rank
FROM sales;

From this query we get a new column dense_rank, and rows with the same sales amount get the same dense_rank value and there are no gaps to subsequent ranks. i.e we get 1, 2, 3, 3, 4, 4, 5.
From this DENSE_RANK query, we get;

| sale_id | client_id | product_name | quantity | sale_amount | dense_rank |
| ------- | --------- | ------------ | -------- | ----------- | ---------- |
| 7       | [NULL]    | Laptop       | 3        | 85,000      | 1          |
| 1       | 1         | Laptop       | 3        | 75,000      | 2          |
| 4       | 4         | Monitor      | 1        | 35,000      | 3          |
| 5       | 5         | Printer      | 7        | 35,000      | 3          |
| 2       | 2         | Phone        | 2        | 25,000      | 4          |
| 3       | 3         | Tablet       | 10       | 25,000      | 4          |
| 6       | 6         | Keyboard     | 2        | 1,000       | 5          |

d)LEAD(): Allows you to compare the value of the current row with the value of the next row.
To demonstrate LEAD, lets do a Right join on the clients and sales table. Then calculate the lead sale amount and order by sale amount in descending order.
We use the query below;

 SELECT s.sale_id, c.client_name, s.product_name, s.sale_amount, 
    LEAD(s.sale_amount) OVER (ORDER BY s.sale_amount DESC) AS next_sale_amount
FROM clients c
RIGHT JOIN sales s
    on c.client_id = s.client_id;

From this query we get a new column next_sale_amount and it helps us compare the current sale_amount to the next sale_amount in descending order as shown below;

| sale_id | client_name   | product_name | sale_amount |next_sale_amount   |
| ------- | ------------- | ------------ | ----------- | ------ |
| 7       | [NULL]        | Laptop       | 85,000      | 75,000 |
| 1       | Alice Johnson | Laptop       | 75,000      | 35,000 |
| 4       | Diana Prince  | Monitor      | 35,000      | 35,000 |
| 5       | Ethan Hunt    | Printer      | 35,000      | 25,000 |
| 2       | Bob Smith     | Phone        | 25,000      | 25,000 |
| 3       | Charlie Brown | Tablet       | 25,000      | 1,000  |
| 6       | Fiona Clark   | Keyboard     | 1,000       | [NULL] |

e)LAG(): Allows you to compare the value of the current row with the value of the previous row.
To demonstrate LAG, lets do a Right join on the clients and sales table. Then calculate the LAG sale_amount and order by sale_amount in descending order.
We use the query below;

SELECT s.sale_id, c.client_name, s.product_name, s.sale_amount, 
      LAG(s.sale_amount) OVER (ORDER BY s.sale_amount DESC) as prev_sale_amount
FROM clients c
RIGHT JOIN sales s
    on c.client_id = s.client_id;

From this query we get a new column prev_sale_amount and it helps us compare the current sale_amount to the previous sale_amount in descending order as shown below;

| sale_id | client_name   | product_name | sale_amount | prev_sale_amount |
| ------- | ------------- | ------------ | ----------- | ---------------- |
| 7       | [NULL]        | Laptop       | 85,000      | [NULL]           |
| 1       | Alice Johnson | Laptop       | 75,000      | 85,000           |
| 4       | Diana Prince  | Monitor      | 35,000      | 75,000           |
| 5       | Ethan Hunt    | Printer      | 35,000      | 35,000           |
| 2       | Bob Smith     | Phone        | 25,000      | 35,000           |
| 3       | Charlie Brown | Tablet       | 25,000      | 25,000           |
| 6       | Fiona Clark   | Keyboard     | 1,000       | 25,000           |

f)NTILE(): This is used to divide rows into quartiles based on the total.
e.g. NTILE(4) – Divides the total rows into 4 equals parts
NTILE(10) – Divides total rows into 10 equal parts
Lets divide the sales table into 3 groups and order by quantity,(NTILE(3)).
We use the query;

SELECT *, 
    NTILE(3) OVER (ORDER BY quantity desc) as sale_amount_ntile
FROM sales;

From the query we get our sales table divided into 3 quartiles;

| sale_id | client_id | product_name | quantity | sale_amount | sale_amount_ntile |
| ------- | --------- | ------------ | -------- | ----------- | ----------------- |
| 3       | 3         | Tablet       | 10       | 25,000      | 1                 |
| 5       | 5         | Printer      | 7        | 35,000      | 1                 |
| 7       | [NULL]    | Laptop       | 3        | 85,000      | 1                 |
| 1       | 1         | Laptop       | 3        | 75,000      | 2                 |
| 2       | 2         | Phone        | 2        | 25,000      | 2                 |
| 6       | 6         | Keyboard     | 2        | 1,000       | 3                 |
| 4       | 4         | Monitor      | 1        | 35,000      | 3                 |

g)SUM() OVER() : is used to calculate the cumulative sum or partitioned sum of a column across a set of rows while preserving each individual row in a result set.
It allows you to compute running totals or group total without losing the detail of each row.

For example lets compute the running totals of the sale_amount using the query below;

SELECT *, 
    SUM(sale_amount) OVER (ORDER BY sale_amount desc) as     rounding_total
FROM sales;

From this query you get a rounding totals column;

| sale_id | client_id | product_name | quantity | sale_amount | rounding_total |
| ------- | --------- | ------------ | -------- | ----------- | -------------- |
| 7       | [NULL]    | Laptop       | 3        | 85,000      | 85,000         |
| 1       | 1         | Laptop       | 3        | 75,000      | 160,000        |
| 4       | 4         | Monitor      | 1        | 35,000      | 230,000        |
| 5       | 5         | Printer      | 7        | 35,000      | 230,000        |
| 2       | 2         | Phone        | 2        | 25,000      | 280,000        |
| 3       | 3         | Tablet       | 10       | 25,000      | 280,000        |
| 6       | 6         | Keyboard     | 2        | 1,000       | 281,000        |

h)PARTITION BY(): This divides the result set into partitions and the window functions work independently within each partition.

For example lets get the Average of the sale_amount and partition by quantity.

SELECT *, 
    AVG(sale_amount) OVER (PARTITION BY quantity) as avg_sale_amount
FROM sales;

This query calculates the average for ever quantity as shown below;

| sale_id | client_id | product_name | quantity | sale_amount | total_quantity |
| ------- | --------- | ------------ | -------- | ----------- | -------------- |
| 4       | 4         | Monitor      | 1        | 35,000      | 35,000         |
| 2       | 2         | Phone        | 2        | 25,000      | 13,000         |
| 6       | 6         | Keyboard     | 2        | 1,000       | 13,000         |
| 1       | 1         | Laptop       | 3        | 75,000      | 80,000         |
| 7       | [NULL]    | Laptop       | 3        | 85,000      | 80,000         |
| 5       | 5         | Printer      | 7        | 35,000      | 35,000         |
| 3       | 3         | Tablet       | 10       | 25,000      | 25,000         |

SUBQUERIES (Nested queries)

A Subquery is a SQL query written inside another query (a query within a query).
Subqueries allow you to perform an operation that depends on the result of another query.
We mainly use subqueries in order to simplify complex logic, to calculate intermediate results and to filter data dynamically.

Subqueries can appear in the:

a)The SELECT clause: These subqueries add an extra computed column to the result set. Each subquery executes once for every row of the outer query.
Here is an example of a query in the SELECT clause;

SELECT c.client_id, c.client_name, c.city ,(
    SELECT SUM(s.quantity) AS total_quantity
    FROM sales s
    WHERE c.client_id =s.client_id
    )
FROM clients c;

From this query we get

| client_id | client_name   | city         | total_quantity |
| --------- | ------------- | ------------ | -------------- |
| 1         | Alice Johnson | New York     | 3              |
| 2         | Bob Smith     | Los Angeles  | 2              |
| 3         | Charlie Brown | Chicago      | 10             |
| 4         | Diana Prince  | Houston      | 1              |
| 5         | Ethan Hunt    | Phoenix      | 7              |
| 6         | Fiona Clark   | Philadelphia | 2              |

b)The FROM clause: These subqueries create a temporary or derived table. The outer table can then select or filter data from it.

Below is an example of a subquery in the FROM clause;

SELECT * 
FROM (
    SELECT product_name, quantity, SUM(sale_amount) AS total_sales
    FROM sales
    GROUP BY product_name,quantity
    )
WHERE total_sales > 25000;

From this query we get;

| product_name | quantity | total_sales |
| ------------ | -------- | ----------- |
| Monitor      | 1        | 35,000      |
| Printer      | 7        | 35,000      |
| Laptop       | 3        | 160,000     |

c)The WHERE clause: These allow the outer query to filter results based on another query’s output.

Here is an example of a subquery in the WHERE clause;

SELECT c.client_name, c.city 
FROM clients c
WHERE client_id IN (
    SELECT s. client_id
    FROM sales s
    WHERE sale_amount > 25000
    );

From this query we get;

| client_name   | city     |
| ------------- | -------- |
| Alice Johnson | New York |
| Diana Prince  | Houston  |
| Ethan Hunt    | Phoenix  |

Subqueries can either be Correlated subqueries or Non-correlated queries.
Correlated subqueries: These subqueries depend on the outer query. They are evaluated once per row of the outer query.

Non-Correlated subqueries: These subqueries do not depend on the outer query. They can be executed on their own.

COMMON TABLE EXPRESSIONS (CTEs)

A CTE in SQL is basically creating a temporary table or result set and then querying that temporary table. The temporary table created only exists during the execution of a single SQL query.
A CTE helps make your SQL query easier to read and organize especially when your query becomes long or has multiple subqueries.
The Syntax of a CTE is;

WITH cte_name AS (
--SQL query that defines the temporary result
SELECT…
FROM
)
SELECT *
FROM cte_name;

Let us now explore different ways to write CTEs:

a)Basic CTEs: These are direct and straight forward CTEs.
For example, let us create a CTE sales_summary with columns product_name, quantity and total_sale_amount. Then we query this CTE.

 WITH sales_summary AS (
    SELECT product_name, quantity, SUM(sale_amount)
    FROM sales
    GROUP BY product_name, quantity
    )
 SELECT *
 FROM sales_summary;

From this query, we get;

| product_name | quantity | total_amount|
| ------------ | -------- | -------     |
| Monitor      | 1        | 35,000      |
| Printer      | 7        | 35,000      |
| Keyboard     | 2        | 1,000       |
| Tablet       | 10       | 25,000      |
| Laptop       | 3        | 160,000     |
| Phone        | 2        | 25,000      |

b)CTEs with Joins: This is where Joins are incorporated into the CTEs.

Here is an example of a sql query of a CTE with a join.

WITH client_purchases AS (
    SELECT s.client_id, c.client_name, s.product_name, SUM(s.quantity) AS total_quantity
    FROM sales s
    JOIN clients c
        ON s.client_id = c.client_id
    GROUP BY s.client_id, c.client_name, product_name
    )
 SELECT *
 FROM client_purchases
 ORDER BY total_quantity desc;

From this query we get;

| client_id | client_name   | product_name | total_quantity |
| --------- | ------------- | ------------ | -------------- |
| 3         | Charlie Brown | Tablet       | 10             |
| 5         | Ethan Hunt    | Printer      | 7              |
| 1         | Alice Johnson | Laptop       | 3              |
| 6         | Fiona Clark   | Keyboard     | 2              |
| 2         | Bob Smith     | Phone        | 2              |
| 4         | Diana Prince  | Monitor      | 1              |

c)Multiple CTEs in one query: This involves defining more than one CTE separated by commas.

Here is a sql query with multiple CTEs;

with client_purchases as (
    select s.client_id, c.client_name, s.product_name
    from sales s
    join clients c
        on s.client_id = c.client_id
    group by s.client_id, c.client_name, product_name
    ),
total_sales as(
    select s.client_id, SUM(s.sale_amount) as total_sale_amount
    from sales s
    group by s.client_id
    )
 select cp.client_id, cp.client_name, cp.product_name, ts.total_sale_amount
 from client_purchases cp
 join total_sales ts
    on cp.client_id = ts.client_id

From this query, we get;

| client_id | client_name   | product_name | total_sale_amount |
| --------- | ------------- | ------------ | ----------------- |
| 1         | Alice Johnson | Laptop       | 75,000            |
| 4         | Diana Prince  | Monitor      | 35,000            |
| 5         | Ethan Hunt    | Printer      | 35,000            |
| 2         | Bob Smith     | Phone        | 25,000            |
| 3         | Charlie Brown | Tablet       | 25,000            |
| 6         | Fiona Clark   | Keyboard     | 1,000             |

d)Recursive CTEs: used when you need to deal with hierarchial data such as managers and employees reporting chains.

     **<u>STORED PROCEDURES</u>**

A stored procedure is a named block of sql logic stored in a database and can be executed by calling.
Stored procedures are mainly used to automate repetitive tasks, for validation and to ensure consistency.

The syntax for Stored procedures is;

CREATE OR REPLACE PROCEDURE procedure_name 
LANGUAGE plpgsql 
AS $$ 
BEGIN 
   -- SQL statements 
END; 
$$;

Explanation of the stored procedure syntax:

CREATE → creates the procedure
OR REPLACE PROCEDURE → updates the procedure if it already exists
procedure_name → name of your procedure
LANGUAGE plpgsql(procedure language for postgresql) → enables procedural features
AS $$ ... $$ → defines the body of the procedure
BEGIN ... start of procedure logic
END; → start/end of procedure logic

After creating the procedure, you can call it using the syntax:

CALL procedure_name(...)

Let us use our clients table below to demonstrate on stored procedures.

clients table
| # | client_id | client_name   | city         |
| - | --------- | ------------- | ------------ |
| 1 | 1         | Alice Johnson | New York     |
| 2 | 2         | Bob Smith     | Los Angeles  |
| 3 | 3         | Charlie Brown | Chicago      |
| 4 | 4         | Diana Prince  | Houston      |
| 5 | 5         | Ethan Hunt    | Phoenix      |
| 6 | 6         | Fiona Clark   | Philadelphia |

For example let us create a stored procedure to add clients to our clients table using the syntax below;

CREATE OR REPLACE PROCEDURE add_clients(
c_client_name varchar,
c_city varchar)
LANGUAGE plpgsql
AS $$
BEGIN 
    INSERT INTO clients(client_name, city)
    VALUES(c_client_name, c_city);
END;
$$;

This query above creates a stored procedure named add_clients

To be able to call this stored procedure 'add_clients', we use the query below;

CALL add_clients('Sylvia Otiende', 'Liverpool');

This adds another client Sylvia Otiende in Liverpool to our clients list as shown below;

| client_id | client_name    | city         |
| --------- | -------------- | ------------ |
| 1         | Alice Johnson  | New York     |
| 2         | Bob Smith      | Los Angeles  |
| 3         | Charlie Brown  | Chicago      |
| 4         | Diana Prince   | Houston      |
| 5         | Ethan Hunt     | Phoenix      |
| 6         | Fiona Clark    | Philadelphia |
| 7         | Sylvia Otiende | Liverpool    |

Let us update our clients table and update the city of client_id 2 (Bob Smith) to Nairobi.
To create the update_clients procedure, we use the syntax below;

CREATE OR REPLACE PROCEDURE update_clients(
c_client_id int,
c_city varchar)
LANGUAGE plpgsql
AS $$
BEGIN
    UPDATE clients
    SET city = c_city
    WHERE client_id = c_client_id;
END;
$$;

To call and update client_id 2 city to Nairobi, we use the query;

CALL update_clients(2, 'Nairobi');

From calling , we get client_id 2 city updated from Los Angeles to Nairobi.

| client_id | client_name    | city         |
| --------- | -------------- | ------------ |
| 1         | Alice Johnson  | New York     |
| 3         | Charlie Brown  | Chicago      |
| 4         | Diana Prince   | Houston      |
| 5         | Ethan Hunt     | Phoenix      |
| 6         | Fiona Clark    | Philadelphia |
| 7         | Sylvia Otiende | Liverpool    |
| 2         | Bob Smith      | Nairobi      |

You can use stored procedures to;

Insert data into a table
update data in a table
Delete data
For validation

Stored procedures are mainly used in;

Banking to record transactions safely
E-commerce to create orders, update stock, apply discounts
HR to update staff records, compute payroll actions
Education to enroll students, assign classes, update results

Conclusion
In conclusion, advanced SQL techniques play a critical role in transforming raw data into meaningful insights.
By mastering concepts such as joins, window functions, CTEs, subqueries and stored procedures, data analysts are able to handle complex datasets efficiently while maintaining accuracy and clarity.
As organizations continue to rely on data for strategic growth, the ability to apply advanced SQL effectively remains an essential skill for every data analyst aiming to deliver impactful and data-driven solutions.

DEV Community

Advanced SQL for Data Analytics: Advanced Techniques Every Data Analyst Should Know

JOINS

WINDOW FUNCTIONS

SUBQUERIES (Nested queries)

COMMON TABLE EXPRESSIONS (CTEs)

Top comments (0)