Hardik Kanajariya

Posted on Oct 25

Day 7: Subqueries and Nested Queries

#postgres #beginners #tutorial #learning

Day 7: Subqueries and Nested Queries

Welcome to Day 7! 🔍

Today we're diving into subqueries - queries within queries! This powerful technique allows you to write complex data retrieval operations by breaking them into smaller, logical steps.

What You'll Learn Today

Understanding Subqueries
Scalar Subqueries
Subqueries in WHERE Clause
Subqueries with IN and EXISTS
Correlated Subqueries
Subqueries in FROM Clause (Derived Tables)

1. Understanding Subqueries

A subquery is a SELECT statement nested inside another SQL statement. Subqueries can appear in various parts of a query.

Basic Syntax

-- General form
SELECT column1
FROM table1
WHERE column2 = (SELECT column3 FROM table2 WHERE condition);

2. Scalar Subqueries

Scalar subqueries return a single value (one row, one column).

-- Find employees earning more than the average salary
SELECT 
    employee_id,
    first_name,
    last_name,
    salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);

-- Compare department salary to company average
SELECT 
    employee_id,
    first_name,
    department,
    salary,
    (SELECT AVG(salary) FROM employees) AS company_avg,
    salary - (SELECT AVG(salary) FROM employees) AS difference
FROM employees;

-- Find the most expensive product
SELECT 
    product_name,
    price
FROM products
WHERE price = (SELECT MAX(price) FROM products);

3. Subqueries in WHERE Clause

Using subqueries for filtering and comparison operations.

Comparison Operators

-- Employees in the highest paying department
SELECT 
    first_name,
    last_name,
    department,
    salary
FROM employees
WHERE department = (
    SELECT department
    FROM employees
    GROUP BY department
    ORDER BY AVG(salary) DESC
    LIMIT 1
);

-- Products priced above category average
SELECT 
    product_name,
    category,
    price
FROM products p1
WHERE price > (
    SELECT AVG(price)
    FROM products p2
    WHERE p2.category = p1.category
);

4. Subqueries with IN and EXISTS

Using IN

-- Employees in specific departments
SELECT 
    first_name,
    last_name,
    department
FROM employees
WHERE department IN (
    SELECT department
    FROM departments
    WHERE location = 'New York'
);

-- Customers who made purchases in 2024
SELECT 
    customer_id,
    customer_name,
    email
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE EXTRACT(YEAR FROM order_date) = 2024
);

-- NOT IN for exclusion
SELECT 
    product_name
FROM products
WHERE product_id NOT IN (
    SELECT product_id
    FROM orders
);
-- Products never ordered

Using EXISTS

EXISTS checks if a subquery returns any rows (more efficient than IN for large datasets).

-- Customers with at least one order
SELECT 
    c.customer_id,
    c.customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- NOT EXISTS for exclusion
SELECT 
    c.customer_id,
    c.customer_name
FROM customers c
WHERE NOT EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);
-- Customers who never ordered

-- Departments with high-salary employees
SELECT DISTINCT
    d.department_name
FROM departments d
WHERE EXISTS (
    SELECT 1
    FROM employees e
    WHERE e.department_id = d.department_id
    AND e.salary > 80000
);

5. Correlated Subqueries

Correlated subqueries reference columns from the outer query. They execute once for each row in the outer query.

-- Employees earning above their department average
SELECT 
    e1.employee_id,
    e1.first_name,
    e1.last_name,
    e1.department,
    e1.salary,
    (SELECT AVG(e2.salary)
     FROM employees e2
     WHERE e2.department = e1.department) AS dept_avg
FROM employees e1
WHERE e1.salary > (
    SELECT AVG(e2.salary)
    FROM employees e2
    WHERE e2.department = e1.department
);

-- Products with above-average price in their category
SELECT 
    p1.product_name,
    p1.category,
    p1.price
FROM products p1
WHERE p1.price > (
    SELECT AVG(p2.price)
    FROM products p2
    WHERE p2.category = p1.category
)
ORDER BY p1.category, p1.price DESC;

-- Customers with above-average order count
SELECT 
    c.customer_id,
    c.customer_name,
    (SELECT COUNT(*)
     FROM orders o
     WHERE o.customer_id = c.customer_id) AS order_count
FROM customers c
WHERE (
    SELECT COUNT(*)
    FROM orders o
    WHERE o.customer_id = c.customer_id
) > (
    SELECT AVG(order_count)
    FROM (
        SELECT COUNT(*) AS order_count
        FROM orders
        GROUP BY customer_id
    ) AS counts
);

6. Subqueries in FROM Clause (Derived Tables)

You can use subqueries to create temporary result sets (derived tables).

-- Calculate statistics from aggregated data
SELECT 
    AVG(dept_avg_salary) AS company_wide_dept_average,
    MAX(dept_avg_salary) AS highest_dept_average,
    MIN(dept_avg_salary) AS lowest_dept_average
FROM (
    SELECT 
        department,
        AVG(salary) AS dept_avg_salary
    FROM employees
    GROUP BY department
) AS department_averages;

-- Rank employees by salary within their department
SELECT 
    employee_id,
    first_name,
    department,
    salary,
    dept_rank
FROM (
    SELECT 
        employee_id,
        first_name,
        department,
        salary,
        RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
    FROM employees
) AS ranked_employees
WHERE dept_rank <= 3;

-- Monthly sales growth analysis
SELECT 
    month,
    monthly_revenue,
    LAG(monthly_revenue) OVER (ORDER BY month) AS previous_month,
    monthly_revenue - LAG(monthly_revenue) OVER (ORDER BY month) AS growth
FROM (
    SELECT 
        DATE_TRUNC('month', order_date) AS month,
        SUM(total_amount) AS monthly_revenue
    FROM orders
    GROUP BY DATE_TRUNC('month', order_date)
) AS monthly_data
ORDER BY month;

7. Advanced Patterns

Multiple Subqueries

-- Compare to multiple benchmarks
SELECT 
    employee_id,
    first_name,
    salary,
    CASE 
        WHEN salary > (SELECT AVG(salary) FROM employees) 
        THEN 'Above Average'
        ELSE 'Below Average'
    END AS vs_company_avg,
    CASE 
        WHEN salary > (SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) FROM employees)
        THEN 'Top 25%'
        ELSE 'Bottom 75%'
    END AS percentile_category
FROM employees;

ANY and ALL Operators

-- ANY: True if condition is met for any value
SELECT product_name, price
FROM products
WHERE price > ANY (
    SELECT price
    FROM products
    WHERE category = 'Electronics'
);

-- ALL: True if condition is met for all values
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > ALL (
    SELECT salary
    FROM employees
    WHERE department = 'Intern'
);

Real-World Example: E-commerce Analysis

-- Comprehensive customer segmentation
SELECT 
    c.customer_id,
    c.customer_name,
    customer_stats.total_orders,
    customer_stats.total_spent,
    customer_stats.avg_order_value,
    CASE 
        WHEN customer_stats.total_spent > (
            SELECT AVG(total_spent)
            FROM (
                SELECT SUM(total_amount) AS total_spent
                FROM orders
                GROUP BY customer_id
            ) AS all_customers
        ) THEN 'High Value'
        ELSE 'Regular'
    END AS customer_segment
FROM customers c
JOIN (
    SELECT 
        customer_id,
        COUNT(*) AS total_orders,
        SUM(total_amount) AS total_spent,
        AVG(total_amount) AS avg_order_value
    FROM orders
    GROUP BY customer_id
) AS customer_stats ON c.customer_id = customer_stats.customer_id
WHERE customer_stats.total_orders > (
    SELECT AVG(order_count)
    FROM (
        SELECT COUNT(*) AS order_count
        FROM orders
        GROUP BY customer_id
    ) AS counts
)
ORDER BY customer_stats.total_spent DESC;

Performance Tips

EXISTS vs IN: Use EXISTS for better performance with large datasets
Avoid correlated subqueries when possible - they can be slow
Consider CTEs: For complex nested queries, Common Table Expressions (CTEs) are more readable
Index columns: Used in subquery WHERE clauses
Test with EXPLAIN: Analyze query execution plans

-- Compare performance
EXPLAIN ANALYZE
SELECT * FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);

EXPLAIN ANALYZE
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);

Common Mistakes to Avoid

-- ❌ WRONG: Subquery returns multiple rows
SELECT * FROM employees
WHERE department = (SELECT department FROM employees);
-- Error: Subquery returns more than one row

-- ✅ CORRECT: Use IN for multiple values
SELECT * FROM employees
WHERE department IN (SELECT DISTINCT department FROM employees);

-- ❌ WRONG: NULL handling with NOT IN
SELECT * FROM products
WHERE product_id NOT IN (SELECT product_id FROM orders);
-- Issue: Returns unexpected results if orders.product_id contains NULL

-- ✅ CORRECT: Use NOT EXISTS or filter NULLs
SELECT * FROM products p
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.product_id = p.product_id);

Today's Challenge! 🎯

Write queries to solve these problems:

Find all products that are more expensive than the average price of products in their category
List customers who have spent more than the average customer spending
Identify departments where the minimum salary is higher than the company-wide average salary
Find products that have never been ordered (use two different methods: NOT IN and NOT EXISTS)

-- Your solutions here:

-- Query 1:

-- Query 2:

-- Query 3:

-- Query 4a (using NOT IN):

-- Query 4b (using NOT EXISTS):

Summary

Today you mastered:

✅ Scalar subqueries for single-value comparisons
✅ Subqueries with IN, EXISTS, ANY, and ALL
✅ Correlated subqueries for row-by-row operations
✅ Derived tables for complex data transformations
✅ Performance optimization techniques

Coming Up Next!

Day 8: Common Table Expressions (CTEs) and WITH Clause - Learn a cleaner, more readable way to write complex queries! 🎉

Questions? Share your solutions in the comments! 💬

PostgreSQL #Database #SQL #Subqueries #15DaysOfPostgreSQL

DEV Community

Day 7: Subqueries and Nested Queries

Day 7: Subqueries and Nested Queries

Welcome to Day 7! 🔍

What You'll Learn Today

1. Understanding Subqueries

Basic Syntax

2. Scalar Subqueries

3. Subqueries in WHERE Clause

Comparison Operators

4. Subqueries with IN and EXISTS

Using IN

Using EXISTS

5. Correlated Subqueries

6. Subqueries in FROM Clause (Derived Tables)

7. Advanced Patterns

Multiple Subqueries

ANY and ALL Operators

Real-World Example: E-commerce Analysis

Performance Tips

Common Mistakes to Avoid

Today's Challenge! 🎯

Summary

Coming Up Next!

PostgreSQL #Database #SQL #Subqueries #15DaysOfPostgreSQL

Top comments (0)