Day 7: Subqueries and Nested Queries
Welcome to Day 7! π
Today we're diving into subqueries - queries within queries! This powerful technique allows you to write complex data retrieval operations by breaking them into smaller, logical steps.
What You'll Learn Today
- Understanding Subqueries
- Scalar Subqueries
- Subqueries in WHERE Clause
- Subqueries with IN and EXISTS
- Correlated Subqueries
- Subqueries in FROM Clause (Derived Tables)
1. Understanding Subqueries
A subquery is a SELECT statement nested inside another SQL statement. Subqueries can appear in various parts of a query.
Basic Syntax
-- General form
SELECT column1
FROM table1
WHERE column2 = (SELECT column3 FROM table2 WHERE condition);
2. Scalar Subqueries
Scalar subqueries return a single value (one row, one column).
-- Find employees earning more than the average salary
SELECT
employee_id,
first_name,
last_name,
salary
FROM employees
WHERE salary > (SELECT AVG(salary) FROM employees);
-- Compare department salary to company average
SELECT
employee_id,
first_name,
department,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg,
salary - (SELECT AVG(salary) FROM employees) AS difference
FROM employees;
-- Find the most expensive product
SELECT
product_name,
price
FROM products
WHERE price = (SELECT MAX(price) FROM products);
3. Subqueries in WHERE Clause
Using subqueries for filtering and comparison operations.
Comparison Operators
-- Employees in the highest paying department
SELECT
first_name,
last_name,
department,
salary
FROM employees
WHERE department = (
SELECT department
FROM employees
GROUP BY department
ORDER BY AVG(salary) DESC
LIMIT 1
);
-- Products priced above category average
SELECT
product_name,
category,
price
FROM products p1
WHERE price > (
SELECT AVG(price)
FROM products p2
WHERE p2.category = p1.category
);
4. Subqueries with IN and EXISTS
Using IN
-- Employees in specific departments
SELECT
first_name,
last_name,
department
FROM employees
WHERE department IN (
SELECT department
FROM departments
WHERE location = 'New York'
);
-- Customers who made purchases in 2024
SELECT
customer_id,
customer_name,
email
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE EXTRACT(YEAR FROM order_date) = 2024
);
-- NOT IN for exclusion
SELECT
product_name
FROM products
WHERE product_id NOT IN (
SELECT product_id
FROM orders
);
-- Products never ordered
Using EXISTS
EXISTS checks if a subquery returns any rows (more efficient than IN for large datasets).
-- Customers with at least one order
SELECT
c.customer_id,
c.customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- NOT EXISTS for exclusion
SELECT
c.customer_id,
c.customer_name
FROM customers c
WHERE NOT EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Customers who never ordered
-- Departments with high-salary employees
SELECT DISTINCT
d.department_name
FROM departments d
WHERE EXISTS (
SELECT 1
FROM employees e
WHERE e.department_id = d.department_id
AND e.salary > 80000
);
5. Correlated Subqueries
Correlated subqueries reference columns from the outer query. They execute once for each row in the outer query.
-- Employees earning above their department average
SELECT
e1.employee_id,
e1.first_name,
e1.last_name,
e1.department,
e1.salary,
(SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department) AS dept_avg
FROM employees e1
WHERE e1.salary > (
SELECT AVG(e2.salary)
FROM employees e2
WHERE e2.department = e1.department
);
-- Products with above-average price in their category
SELECT
p1.product_name,
p1.category,
p1.price
FROM products p1
WHERE p1.price > (
SELECT AVG(p2.price)
FROM products p2
WHERE p2.category = p1.category
)
ORDER BY p1.category, p1.price DESC;
-- Customers with above-average order count
SELECT
c.customer_id,
c.customer_name,
(SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id) AS order_count
FROM customers c
WHERE (
SELECT COUNT(*)
FROM orders o
WHERE o.customer_id = c.customer_id
) > (
SELECT AVG(order_count)
FROM (
SELECT COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) AS counts
);
6. Subqueries in FROM Clause (Derived Tables)
You can use subqueries to create temporary result sets (derived tables).
-- Calculate statistics from aggregated data
SELECT
AVG(dept_avg_salary) AS company_wide_dept_average,
MAX(dept_avg_salary) AS highest_dept_average,
MIN(dept_avg_salary) AS lowest_dept_average
FROM (
SELECT
department,
AVG(salary) AS dept_avg_salary
FROM employees
GROUP BY department
) AS department_averages;
-- Rank employees by salary within their department
SELECT
employee_id,
first_name,
department,
salary,
dept_rank
FROM (
SELECT
employee_id,
first_name,
department,
salary,
RANK() OVER (PARTITION BY department ORDER BY salary DESC) AS dept_rank
FROM employees
) AS ranked_employees
WHERE dept_rank <= 3;
-- Monthly sales growth analysis
SELECT
month,
monthly_revenue,
LAG(monthly_revenue) OVER (ORDER BY month) AS previous_month,
monthly_revenue - LAG(monthly_revenue) OVER (ORDER BY month) AS growth
FROM (
SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY DATE_TRUNC('month', order_date)
) AS monthly_data
ORDER BY month;
7. Advanced Patterns
Multiple Subqueries
-- Compare to multiple benchmarks
SELECT
employee_id,
first_name,
salary,
CASE
WHEN salary > (SELECT AVG(salary) FROM employees)
THEN 'Above Average'
ELSE 'Below Average'
END AS vs_company_avg,
CASE
WHEN salary > (SELECT PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY salary) FROM employees)
THEN 'Top 25%'
ELSE 'Bottom 75%'
END AS percentile_category
FROM employees;
ANY and ALL Operators
-- ANY: True if condition is met for any value
SELECT product_name, price
FROM products
WHERE price > ANY (
SELECT price
FROM products
WHERE category = 'Electronics'
);
-- ALL: True if condition is met for all values
SELECT employee_id, first_name, salary
FROM employees
WHERE salary > ALL (
SELECT salary
FROM employees
WHERE department = 'Intern'
);
Real-World Example: E-commerce Analysis
-- Comprehensive customer segmentation
SELECT
c.customer_id,
c.customer_name,
customer_stats.total_orders,
customer_stats.total_spent,
customer_stats.avg_order_value,
CASE
WHEN customer_stats.total_spent > (
SELECT AVG(total_spent)
FROM (
SELECT SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
) AS all_customers
) THEN 'High Value'
ELSE 'Regular'
END AS customer_segment
FROM customers c
JOIN (
SELECT
customer_id,
COUNT(*) AS total_orders,
SUM(total_amount) AS total_spent,
AVG(total_amount) AS avg_order_value
FROM orders
GROUP BY customer_id
) AS customer_stats ON c.customer_id = customer_stats.customer_id
WHERE customer_stats.total_orders > (
SELECT AVG(order_count)
FROM (
SELECT COUNT(*) AS order_count
FROM orders
GROUP BY customer_id
) AS counts
)
ORDER BY customer_stats.total_spent DESC;
Performance Tips
- EXISTS vs IN: Use EXISTS for better performance with large datasets
- Avoid correlated subqueries when possible - they can be slow
- Consider CTEs: For complex nested queries, Common Table Expressions (CTEs) are more readable
- Index columns: Used in subquery WHERE clauses
- Test with EXPLAIN: Analyze query execution plans
-- Compare performance
EXPLAIN ANALYZE
SELECT * FROM customers
WHERE customer_id IN (SELECT customer_id FROM orders);
EXPLAIN ANALYZE
SELECT * FROM customers c
WHERE EXISTS (SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id);
Common Mistakes to Avoid
-- β WRONG: Subquery returns multiple rows
SELECT * FROM employees
WHERE department = (SELECT department FROM employees);
-- Error: Subquery returns more than one row
-- β
CORRECT: Use IN for multiple values
SELECT * FROM employees
WHERE department IN (SELECT DISTINCT department FROM employees);
-- β WRONG: NULL handling with NOT IN
SELECT * FROM products
WHERE product_id NOT IN (SELECT product_id FROM orders);
-- Issue: Returns unexpected results if orders.product_id contains NULL
-- β
CORRECT: Use NOT EXISTS or filter NULLs
SELECT * FROM products p
WHERE NOT EXISTS (SELECT 1 FROM orders o WHERE o.product_id = p.product_id);
Today's Challenge! π―
Write queries to solve these problems:
- Find all products that are more expensive than the average price of products in their category
- List customers who have spent more than the average customer spending
- Identify departments where the minimum salary is higher than the company-wide average salary
- Find products that have never been ordered (use two different methods: NOT IN and NOT EXISTS)
-- Your solutions here:
-- Query 1:
-- Query 2:
-- Query 3:
-- Query 4a (using NOT IN):
-- Query 4b (using NOT EXISTS):
Summary
Today you mastered:
- β Scalar subqueries for single-value comparisons
- β Subqueries with IN, EXISTS, ANY, and ALL
- β Correlated subqueries for row-by-row operations
- β Derived tables for complex data transformations
- β Performance optimization techniques
Coming Up Next!
Day 8: Common Table Expressions (CTEs) and WITH Clause - Learn a cleaner, more readable way to write complex queries! π
Questions? Share your solutions in the comments! π¬
Top comments (0)