Debunking the Myth: Is JOIN Always Faster Than Correlated Subqueries?

#performance #sql #database

Hey there, fellow developers! If you've ever dabbled in SQL, you've probably heard the golden rule: "Never use correlated subqueries in SELECT—they're a recipe for N+1 disasters!" Instead, we're told to always opt for JOINs because they're set-based, efficient, and lightning-fast.

But is this rule set in stone? I decided to put it to the test across four popular database systems: MySQL 8.0, Oracle 23c, PostgreSQL 16, and SQLite 3.45. Spoiler alert: The results were eye-opening. Sometimes, the "bad" correlated subquery outperformed the "good" JOIN. Let's dive in and see why.

The Test Setup: Customers and Orders
To keep things fair, I used a simple schema with two tables:

customers: A small table with 25 rows of customer data.
orders: A larger table with 1,000 rows of orders, linked via a foreign key.
The goal? Count the number of orders per customer, including those with zero orders.

Here's the schema (using MySQL syntax for reference):

-- Customers table
CREATE TABLE customers (
    customer_id INT PRIMARY KEY,
    name VARCHAR(255) NOT NULL
);

-- Orders table
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATETIME,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id) ON DELETE CASCADE
);

Data was populated with random values to simulate real-world scenarios.

The Two Queries: JOIN vs. Correlated Subquery
I compared two approaches to achieve the same result.

The "Good" Way – JOIN + GROUP BY This is the set-based, relational approach everyone loves:

SELECT 
    c.customer_id, 
    COUNT(o.order_id) AS orders_count
FROM 
    customers c
LEFT JOIN 
    orders o ON c.customer_id = o.customer_id
GROUP BY 
    c.customer_id;

Pros: Handles all customers, even those without orders.
Theory: One optimized operation to join and aggregate.

The "Bad" Way – Correlated Subquery This is the row-by-row method we're warned against:

SELECT 
    c.customer_id, 
    (SELECT COUNT(o.order_id) 
     FROM orders o 
     WHERE o.customer_id = c.customer_id) AS orders_count
FROM 
    customers c;

Pros: Also includes customers with zero orders.
Theory: Executes a subquery for each customer—classic N+1 problem.

Testing Across Databases: The Results
I ran both queries on online SQL testers (links provided below) and analyzed execution times and plans using EXPLAIN. Here's what happened.

MySQL 8.0: Subquery Wins!
Execution Times: Subquery ~14 ms vs. JOIN ~16 ms.
Why? The subquery triggered a Nested Loop plan with fast index lookups (25 quick searches). JOIN used Hash Join + Aggregate, which was overkill for small data.
Key Insight: With an index on orders.customer_id, the subquery wasn't N+1—it was efficient Nested Loops.
Test Link: MySQL Tester

Oracle 23c: Subquery Dominates!
Execution Times: Subquery ~2.4 ms vs. JOIN ~15 ms.
Why? Similar to MySQL—Nested Loop for subquery vs. Hash Join for JOIN. The subquery avoided heavy aggregation overhead.
Key Insight: Indexes are crucial; without them, Oracle falls back to full scans.
Test Link: Oracle Tester

PostgreSQL 16: JOIN Takes the Lead
Execution Times: JOIN ~0.6 ms vs. Subquery ~1.9 ms.
Why? PostgreSQL's optimizer rewrote the subquery into a JOIN-like plan, but the explicit JOIN was slightly faster. Subquery showed 25 sub-plan executions (mild N+1).
Key Insight: PostgreSQL is smart—indexes level the playing field.
Test Link: PostgreSQL Tester

SQLite 3.45: A Tie!
Execution Times: Both ~1 ms.
Why? Plans were nearly identical: SCAN on customers + SEARCH on orders via index. No N+1 effect.
Key Insight: SQLite's simplicity made both queries efficient; choose based on readability.
Test Link: SQLite Tester

Key Takeaways: No Silver Bullet
The "JOIN is always faster" myth crumbles because performance depends on:

Database Optimizer: PostgreSQL rewrites queries; MySQL/Oracle follow your syntax more literally.
Data Size: Small outer tables (like our 25 customers) favor Nested Loops; large ones benefit from Hash Joins.
Indexes: Without an index on orders.customer_id, subqueries tank. With it, they shine.
Bottom Line: Don't blindly follow rules. Always run EXPLAIN (or EXPLAIN ANALYZE) to see the actual execution plan. Test with your data!

What are your experiences with JOINs vs. subqueries? Drop a comment below!

This article is based on real testing and analysis. Links to testers are provided for you to verify the results.