Hey there, fellow developers! If you've ever dabbled in SQL, you've probably heard the golden rule: "Never use correlated subqueries in SELECT—they're a recipe for N+1 disasters!" Instead, we're told to always opt for JOINs because they're set-based, efficient, and lightning-fast.
But is this rule set in stone? I decided to put it to the test across four popular database systems: MySQL 8.0, Oracle 23c, PostgreSQL 16, and SQLite 3.45. Spoiler alert: The results were eye-opening. Sometimes, the "bad" correlated subquery outperformed the "good" JOIN. Let's dive in and see why.
The Test Setup: Customers and Orders
To keep things fair, I used a simple schema with two tables:
customers: A small table with 25 rows of customer data.
orders: A larger table with 1,000 rows of orders, linked via a foreign key.
The goal? Count the number of orders per customer, including those with zero orders.
Here's the schema (using MySQL syntax for reference):
Data was populated with random values to simulate real-world scenarios.
The Two Queries: JOIN vs. Correlated Subquery
I compared two approaches to achieve the same result.
- The "Good" Way – JOIN + GROUP BY This is the set-based, relational approach everyone loves:
- Pros: Handles all customers, even those without orders.
- Theory: One optimized operation to join and aggregate.
- The "Bad" Way – Correlated Subquery This is the row-by-row method we're warned against:
- Pros: Also includes customers with zero orders.
- Theory: Executes a subquery for each customer—classic N+1 problem.
Testing Across Databases: The Results
I ran both queries on online SQL testers (links provided below) and analyzed execution times and plans using EXPLAIN. Here's what happened.
MySQL 8.0: Subquery Wins!
Execution Times: Subquery ~14 ms vs. JOIN ~16 ms.
Why? The subquery triggered a Nested Loop plan with fast index lookups (25 quick searches). JOIN used Hash Join + Aggregate, which was overkill for small data.
Key Insight: With an index on orders.customer_id, the subquery wasn't N+1—it was efficient Nested Loops.
Test Link: MySQL Tester
Oracle 23c: Subquery Dominates!
Execution Times: Subquery ~2.4 ms vs. JOIN ~15 ms.
Why? Similar to MySQL—Nested Loop for subquery vs. Hash Join for JOIN. The subquery avoided heavy aggregation overhead.
Key Insight: Indexes are crucial; without them, Oracle falls back to full scans.
Test Link: Oracle Tester
PostgreSQL 16: JOIN Takes the Lead
Execution Times: JOIN ~0.6 ms vs. Subquery ~1.9 ms.
Why? PostgreSQL's optimizer rewrote the subquery into a JOIN-like plan, but the explicit JOIN was slightly faster. Subquery showed 25 sub-plan executions (mild N+1).
Key Insight: PostgreSQL is smart—indexes level the playing field.
Test Link: PostgreSQL Tester
SQLite 3.45: A Tie!
Execution Times: Both ~1 ms.
Why? Plans were nearly identical: SCAN on customers + SEARCH on orders via index. No N+1 effect.
Key Insight: SQLite's simplicity made both queries efficient; choose based on readability.
Test Link: SQLite Tester
Key Takeaways: No Silver Bullet
The "JOIN is always faster" myth crumbles because performance depends on:
Database Optimizer: PostgreSQL rewrites queries; MySQL/Oracle follow your syntax more literally.
Data Size: Small outer tables (like our 25 customers) favor Nested Loops; large ones benefit from Hash Joins.
Indexes: Without an index on orders.customer_id, subqueries tank. With it, they shine.
Bottom Line: Don't blindly follow rules. Always run EXPLAIN (or EXPLAIN ANALYZE) to see the actual execution plan. Test with your data!
What are your experiences with JOINs vs. subqueries? Drop a comment below!
This article is based on real testing and analysis. Links to testers are provided for you to verify the results.
Top comments (0)