Structured Query Language (SQL) is one of the most widely used languages for interacting with databases, yet even experienced developers often make subtle mistakes that affect performance, readability, and security. Writing high-quality SQL queries is critical for scalability, maintainability, and efficiency.
1. The Siren Call of SELECT *
Instead of Explicit Columns
Problem: SELECT *
retrieves every column from a table, even those not needed. This increases network load, slows down queries, and can break applications if the schema changes.
Solution: Always specify the exact columns you need:
SELECT id, name, created_at FROM users;
-- Instead of:
SELECT * FROM orders;
-- Use:
SELECT order_id, customer_id, order_date, total_amount
FROM orders;
This improves performance and keeps your queries predictable.
2. Neglecting the Power of Indexes
Problem: Queries that filter large tables without proper indexes often result in full table scans, significantly slowing down performance.
The Mistake: Writing predicates that prevent index usage.
Common culprits include:
- Wrapping a column in a function:
WHERE YEAR(order_date) = 2023
- Using a wildcard at the beginning of a
LIKE
pattern:WHERE customer_name LIKE '%Smith%'
- Using
OR
conditions on different columns without appropriate indexes. The Impact: The query forces a full table scan, which becomes exponentially slower as the table grows.
Solution: Ensure that columns used in WHERE
, JOIN
, and ORDER BY
clauses are indexed where appropriate. Always analyze query execution plans to confirm indexes are being used.
code
-- Instead of (non-sargable):
SELECT * FROM orders WHERE YEAR(order_date) = 2023;
-- Use (sargable - Search ARGument ABLE):
SELECT * FROM orders WHERE order_date >= '2023-01-01' AND order_date < '2024-01-01';
3. Mishandling NULL
Values
Problem: Failing to account for NULL
values can produce unexpected results, especially with comparison operators.
Solution: Use IS NULL
or COALESCE()
or IS NOT NULL
or IFNULL()
to handle null values explicitly:
SELECT COALESCE(email, 'kipngenogregory@gmail.com') AS safe_email FROM users;
or
-- Instead of (will not find NULL phone numbers):
SELECT * FROM customers WHERE phone_number = NULL;
-- Use:
SELECT * FROM customers WHERE phone_number IS NULL;
-- To safely perform calculations:
SELECT product_id, price * COALESCE(quantity, 0) AS estimated_value FROM order_items;
4. Improper Filtering in GROUP BY
and HAVING
A confusion between the WHERE
and HAVING
clauses leads to inefficient queries.
- The Mistake: Using the
HAVING
clause to filter rows before aggregation. - The Impact: The
HAVING
clause filters groups after they have been aggregated. Filtering individual rows first is the job of the WHERE clause, which is far more efficient as it reduces the working data set before the costly aggregation operation. The Professional Approach: UseWHERE
to filter rows. UseHAVING
to filter groups.
-- Inefficient:
SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
GROUP BY customer_id
HAVING customer_id > 100; -- Filtering on a single row *after* grouping
-- Efficient:
SELECT customer_id, COUNT(order_id) AS order_count
FROM orders
WHERE customer_id > 100 -- Filter rows *before* grouping
GROUP BY customer_id;
Conclusion
By eschewing SELECT *
, respecting indexes, handling NULLs
correctly, using explicit joins, and filtering strategically, you elevate your code from merely functional to truly exceptional. This leads to systems that are faster, more reliable, and easier to debug—a hallmark of a true data professional.
by gregory.tech
Top comments (0)