sql between is the most innocent-looking keyword in the entire SQL surface — three letters, two operands, "inclusive on both ends," done. Then your nightly report misses every record that landed on the last day of the month after midnight, your float column quietly excludes the values you swore were inside the range, and your created_at BETWEEN '2026-01-01' AND '2026-01-31' filter omits 23 hours and 59 minutes of traffic. The keyword isn't broken; the mental model "inclusive both ends" simply doesn't survive contact with timestamps, floats, and back-to-back monthly buckets.
This guide walks through every between sql pattern an interviewer or a production query reviewer will probe — numeric BETWEEN over INT, NUMERIC, and FLOAT (the IEEE-754 trap that swallows 0.2), the date BETWEEN time-truncation pitfall that ranks as the #1 production query bug, half-open intervals [start, end) as the only sql range query notation that composes cleanly, the four-quadrant matrix of closed/open endpoints, ANSI OVERLAPS, Postgres tstzrange, and sql between dates performance under B-tree, BRIN, and partition pruning. Each ### Solution Using … ships the four-part Solution Tail — code, step-by-step trace, output table, and a concept-by-concept why — so you can defend the answer in a sql between numbers interview the same way a senior DE defends a sql date range rewrite in code review.
When you want hands-on reps immediately after reading, drill the SQL practice library →, rehearse on filtering problems (SQL) →, and stress-test edge cases with date functions (SQL) →.
On this page
- Why BETWEEN is a footgun (and the half-open fix in one paragraph)
- Numeric BETWEEN — INT, NUMERIC, FLOAT, and the precision trap
- Date BETWEEN — the time-truncation pitfall
- Half-open intervals — the production pattern
- BETWEEN performance — indexes, BRIN, partition pruning
- Cheat sheet — interval recipes for production
- Frequently asked questions
- Practice on PipeCode
1. Why BETWEEN is a footgun (and the half-open fix in one paragraph)
sql between is closed-closed by ANSI standard — most production queries want closed-open
The one-sentence definition: x BETWEEN a AND b is exactly equivalent to x >= a AND x <= b — both endpoints are inclusive. That's it. Every SQL dialect (Postgres, MySQL, SQL Server, Oracle, SQLite, BigQuery, Snowflake, Redshift, DuckDB) agrees on this; the ANSI/ISO SQL standard pins it down in the same way. The keyword is shorthand, not magic — the optimiser unfolds it into a pair of comparisons before it ever touches the planner.
The closed-closed semantics are not the problem. The problem is that the mental model "inclusive on both ends" maps cleanly to integers and money columns but breaks the moment one of the operands is a timestamp (which has hours, minutes, seconds, microseconds) or a float (which is stored as a binary fraction and rounds in unintuitive ways). Half the production query bugs that ship to incident channels under headlines like "monthly revenue report is off by a day" are date-BETWEEN bugs.
The three flavours of BETWEEN and what they actually do.
-
Numeric BETWEEN.
price BETWEEN 100 AND 200→price >= 100 AND price <= 200. Predictable for INT and NUMERIC; precision-sensitive for FLOAT. -
Date / timestamp BETWEEN.
created_at BETWEEN '2026-01-01' AND '2026-01-31'→created_at >= '2026-01-01 00:00:00' AND created_at <= '2026-01-31 00:00:00'. The right-hand operand is implicitly cast to midnight, so 23h 59m of Jan 31 is excluded. This is the bug. -
String BETWEEN.
name BETWEEN 'A' AND 'M'→ lexicographic comparison. Less common in interviews; covered briefly later for completeness.
Why senior DEs prefer col >= start AND col < end (half-open) over BETWEEN.
-
Composability. Half-open intervals
[start, end)for back-to-back months —[Jan, Feb),[Feb, Mar)— never double-count and never gap. The end of one interval is exactly the start of the next, and the open end excludes Feb 1 from[Jan, Feb)so Feb 1 belongs to[Feb, Mar)only. -
Time-grain independence.
created_at < '2026-02-01'correctly excludes everything on or after Feb 1 — including2026-01-31 23:59:59.999999. You don't have to know whether the column is DATE, TIMESTAMP, or TIMESTAMPTZ; you don't have to remember whether your engine stores microseconds or nanoseconds. - Index-friendly. Both BETWEEN and half-open compile to the same B-tree range scan, so the rewrite costs zero performance. The choice is purely about correctness.
-
Reads cleanly.
created_at >= '2026-01-01' AND created_at < '2026-02-01'is self-documenting — anyone reading the query knows you mean "all of January" without having to know what "BETWEEN" cast the end operand to.
What interviewers listen for when sql between comes up.
- Do you mention that BETWEEN is closed on both ends? — baseline answer.
- Do you bring up the date BETWEEN time-truncation pitfall unprompted? — senior signal.
- Do you reach for half-open intervals as the production default? — senior+ signal.
- Do you mention BRIN indexes or partition pruning when the question becomes performance-flavoured? — staff-level signal.
- Do you note
OVERLAPSortstzrangewhen the problem becomes interval-vs-interval rather than scalar-vs-interval? — Postgres-fluent signal.
Worked example — show that BETWEEN is closed on both ends
Detailed explanation. The simplest demonstration is to populate a tiny table with the boundary values and check which ones pass BETWEEN. It's the kind of one-page demo that ends a "convince me you understand inclusion" interview detour.
Question. Given a table ranges(id INT, value INT) with rows {1, 10}, {2, 50}, {3, 100}, {4, 150}, {5, 200}, {6, 250}, which rows satisfy value BETWEEN 50 AND 200?
Input.
| id | value |
|---|---|
| 1 | 10 |
| 2 | 50 |
| 3 | 100 |
| 4 | 150 |
| 5 | 200 |
| 6 | 250 |
Code.
SELECT id, value
FROM ranges
WHERE value BETWEEN 50 AND 200
ORDER BY id;
Step-by-step explanation.
- The planner rewrites
value BETWEEN 50 AND 200tovalue >= 50 AND value <= 200. This is the only thing BETWEEN ever does at the engine level. - Row 1 (
value=10) fails thevalue >= 50test and is excluded. - Row 2 (
value=50) satisfiesvalue >= 50 AND value <= 200— both endpoints are inclusive, so50passes. - Rows 3 and 4 fall comfortably inside the interval.
- Row 5 (
value=200) satisfiesvalue <= 200— again, inclusive on the upper end. - Row 6 (
value=250) fails thevalue <= 200test and is excluded.
Output.
| id | value |
|---|---|
| 2 | 50 |
| 3 | 100 |
| 4 | 150 |
| 5 | 200 |
Rule of thumb. When you read x BETWEEN a AND b, mentally rewrite it to x >= a AND x <= b before you reason about edge cases. The shorthand hides nothing — but the closed-closed shape is exactly the part that breaks for timestamps and floats.
Worked example — NOT BETWEEN and the NULL gotcha
Detailed explanation. A common follow-up probe is NOT BETWEEN. The keyword negates the entire inclusion predicate — x NOT BETWEEN a AND b is x < a OR x > b. The interview gotcha is that it shares SQL's three-valued logic with every other comparison: NULL NOT BETWEEN a AND b evaluates to NULL, which fails the WHERE clause. Candidates who write WHERE amount NOT BETWEEN 100 AND 200 to "filter out the middle band" are surprised when NULL-amount rows also vanish.
Question. With the rows {1, 50}, {2, 100}, {3, 150}, {4, 200}, {5, 250}, {6, NULL}, what does WHERE value NOT BETWEEN 100 AND 200 return? Then rewrite it to include the NULL row explicitly.
Input.
| id | value |
|---|---|
| 1 | 50 |
| 2 | 100 |
| 3 | 150 |
| 4 | 200 |
| 5 | 250 |
| 6 | NULL |
Code.
-- Naïve NOT BETWEEN — misses NULL rows
SELECT id, value
FROM ranges
WHERE value NOT BETWEEN 100 AND 200
ORDER BY id;
-- The fix when NULL should also be considered "outside the band"
SELECT id, value
FROM ranges
WHERE value NOT BETWEEN 100 AND 200
OR value IS NULL
ORDER BY id;
Step-by-step explanation.
-
Row 1 (
value=50).50 NOT BETWEEN 100 AND 200→50 < 100 OR 50 > 200→ true. Passes. -
Rows 2, 3, 4 (
value=100, 150, 200). All inside the closed interval →NOT BETWEENis false. Excluded. -
Row 5 (
value=250).250 > 200→ true. Passes. -
Row 6 (
value=NULL).NULL NOT BETWEEN 100 AND 200→NULL < 100 OR NULL > 200→NULL OR NULL→NULL. The WHERE clause requires true; NULL fails it. Excluded — even though it isn't "inside" the band. -
The fix. Add
OR value IS NULLto include NULL rows explicitly. Three-valued logic doesn't bite if you handle the NULL case yourself.
Output (naïve query).
| id | value |
|---|---|
| 1 | 50 |
| 5 | 250 |
Output (fixed query).
| id | value |
|---|---|
| 1 | 50 |
| 5 | 250 |
| 6 | NULL |
Rule of thumb. BETWEEN, NOT BETWEEN, =, <>, <, >, <=, >= — every comparison against NULL is NULL, which fails the WHERE clause. If "NULL means missing data and should be visible," handle it with an explicit IS NULL or COALESCE.
SQL interview question on BETWEEN inclusion
A common opener: "We have a column score and a filter score BETWEEN 60 AND 100. A row with score = 100 showed up in production. Is that expected, and how would you write the filter to exclude exactly the upper boundary if a stakeholder asks?"
Solution Using a half-open rewrite score >= 60 AND score < 100
-- Original (inclusive both ends): includes 100
SELECT id, score
FROM grades
WHERE score BETWEEN 60 AND 100;
-- Rewrite to exclude the upper bound (half-open [60, 100)):
SELECT id, score
FROM grades
WHERE score >= 60 AND score < 100;
-- Rewrite to exclude the lower bound only ((60, 100]):
SELECT id, score
FROM grades
WHERE score > 60 AND score <= 100;
Step-by-step trace.
| id | score | BETWEEN 60 AND 100? | >= 60 AND < 100? |
|---|---|---|---|
| 1 | 55 | no — fails score >= 60
|
no — fails score >= 60
|
| 2 | 60 | yes — boundary, inclusive | yes — boundary, inclusive lower |
| 3 | 80 | yes — inside | yes — inside |
| 4 | 100 | yes — boundary, inclusive | no — open upper, 100 excluded |
| 5 | 101 | no — fails score <= 100
|
no — fails score < 100
|
Output:
| Filter | id values that pass |
|---|---|
BETWEEN 60 AND 100 |
2, 3, 4 |
>= 60 AND < 100 |
2, 3 |
> 60 AND <= 100 |
3, 4 |
Why this works — concept by concept:
-
BETWEEN is shorthand for closed-closed —
x BETWEEN a AND bis exactlyx >= a AND x <= b. No dialect deviates from this. -
Half-open
[a, b)—>= a AND < bincludes the lower boundary, excludes the upper. The default production interval shape because adjacent intervals don't overlap. -
Half-open
(a, b]—> a AND <= bis the mirror image. Rarer in practice; useful when "cumulative up to and including b" is the semantic you want. -
Open-open
(a, b)—> a AND < b. Almost never the right choice; both boundaries are excluded. - Cost — all four forms compile to the same B-tree range scan; the cost difference is zero. The choice is purely a correctness decision, not a performance one.
SQL
Topic — filtering (SQL)
Filtering & range query problems (SQL)
2. Numeric BETWEEN — INT, NUMERIC, FLOAT, and the precision trap
sql between numbers is predictable for INT and NUMERIC — and surprisingly broken for FLOAT
For most numeric types, BETWEEN does what the mental model says — closed-closed inclusion over exact arithmetic. The break is FLOAT (and its siblings DOUBLE PRECISION, REAL, BINARY_FLOAT). Floats are stored as IEEE-754 binary fractions, and most decimal values you type (0.1, 0.2, 0.3) cannot be represented exactly — they're rounded to the nearest binary fraction at insert time and again at compare time. That rounding occasionally pushes a "0.2" outside a "BETWEEN 0.1 AND 0.3" filter you swore would catch it.
The three numeric column families and how BETWEEN behaves on each.
-
Integer types (
SMALLINT,INT,BIGINT). Exact storage.BETWEEN 100 AND 200works the way the textbook says — every value 100..200 inclusive passes. -
Fixed-point types (
NUMERIC(p, s),DECIMAL(p, s)). Exact decimal storage. The right choice for money, percentages, counts that need fractional precision. BETWEEN over NUMERIC is bit-for-bit predictable; no hidden rounding. -
Floating-point types (
FLOAT,DOUBLE PRECISION,REAL). IEEE-754 binary, 32 or 64 bits. Most decimal literals are approximations. BETWEEN over FLOAT can quietly include or exclude values you'd expect to fall the other way.
The money rule. If a column represents currency, store it as NUMERIC(precision, scale) (or scaled integer cents) — never FLOAT. The BETWEEN price * 0.9 AND price * 1.1 calculation you'd write for "within 10% of list price" is safe over NUMERIC and unsafe over FLOAT. Production schemas in Stripe, Square, Plaid, and every accounting system treat this as a non-negotiable.
BETWEEN with calculated expressions.
-
Constant-on-both-sides.
WHERE price BETWEEN 100 AND 200— index-friendly. -
Per-row expression on the right-hand side.
WHERE price BETWEEN cost * 0.9 AND cost * 1.1— re-evaluated per row, still index-friendly onpriceif there's a B-tree index. -
Function on the column.
WHERE LOWER(SUBSTRING(...)) BETWEEN 'a' AND 'm'— kills index usage. The optimiser can't see through the function. If you must filter on a transformed value, build a functional index onLOWER(SUBSTRING(...)).
The five most-asked numeric BETWEEN questions.
- "Is
BETWEEN 100 AND 200inclusive of 100 and 200?" — yes, both endpoints. - "Does BETWEEN work with NULL?" — no.
NULL BETWEEN a AND bis NULL (unknown), and NULL fails the WHERE clause; this is the standard three-valued logic story. - "Is BETWEEN faster than
>= AND <=?" — identical; they compile to the same plan. - "When is BETWEEN over a FLOAT a bug?" — when a literal endpoint can't be represented exactly in binary, the comparison may misorder by one ULP.
- "Should I use BETWEEN for percentages?" — only if the column is NUMERIC; if it's FLOAT, prefer NUMERIC casts or an explicit tolerance.
Worked example — BETWEEN over INT vs NUMERIC vs FLOAT
Detailed explanation. A small test table with three columns — one INT, one NUMERIC, one DOUBLE PRECISION — makes the precision difference visible in three rows. The INT and NUMERIC pass the same filter cleanly; the FLOAT row drops or admits values you didn't expect.
Question. A prices table stores the same logical value (e.g. 0.2) in three columns of different types. A query filters with BETWEEN 0.1 AND 0.3. Show which columns return all expected rows and which one silently misses one.
Input.
| id | i_value (INT) | n_value (NUMERIC) | f_value (DOUBLE) |
|---|---|---|---|
| 1 | 0 | 0.10 | 0.1 |
| 2 | 0 | 0.20 | 0.2 |
| 3 | 0 | 0.30 | 0.3 |
| 4 | 0 | 0.40 | 0.4 |
(i_value is filled with 0 to keep the row count constant; the focus is on the NUMERIC and FLOAT columns.)
Code.
-- 1. NUMERIC BETWEEN — predictable
SELECT id, n_value
FROM prices
WHERE n_value BETWEEN 0.10 AND 0.30
ORDER BY id;
-- 2. FLOAT / DOUBLE BETWEEN — precision-sensitive
SELECT id, f_value, f_value::text AS exact_repr
FROM prices
WHERE f_value BETWEEN 0.1 AND 0.3
ORDER BY id;
-- 3. The safe rewrite when storage is locked to FLOAT
SELECT id, f_value
FROM prices
WHERE f_value >= 0.1 - 1e-9 AND f_value <= 0.3 + 1e-9
ORDER BY id;
Step-by-step explanation.
-
NUMERIC version. Storage is exact decimal.
0.10 <= 0.10,0.10 <= 0.20,0.10 <= 0.30— all true;0.30 >= 0.10,0.30 >= 0.20,0.30 >= 0.30— all true; row 4 (0.40) fails. Rows 1, 2, 3 pass. -
FLOAT version.
0.1,0.2,0.3cannot all be stored exactly in IEEE-754. The actual binary representations are roughly0.10000000000000000555…,0.20000000000000001110…,0.29999999999999998890…. The endpoint literal0.3gets rounded one way; the stored0.3may be rounded the other way. In some engines this meansf_value=0.3failsf_value <= 0.3by one ULP, and the row drops out. -
The safe rewrite. Add a tolerance epsilon (
1e-9) on both sides. This widens the interval by less than 0.000000001 — wider than any FP rounding error for double precision in the range[0, 1], narrower than any meaningful business distinction. -
The real-world fix. Store currency, percentages, and exact decimals as
NUMERIC(p, s). FLOAT is for tolerance-aware analytics (sensor readings, ML feature columns) where ULP-level differences don't matter.
Output.
| Filter | Rows returned |
|---|---|
NUMERIC BETWEEN 0.10 AND 0.30
|
1, 2, 3 (deterministic) |
FLOAT BETWEEN 0.1 AND 0.3
|
usually 1, 2, 3 — but on some engines 3 silently drops |
FLOAT BETWEEN 0.1 - 1e-9 AND 0.3 + 1e-9
|
1, 2, 3 (safe) |
Rule of thumb. If you're writing a BETWEEN over a money or percentage column, the column should be NUMERIC. If it's FLOAT and you can't change the schema, widen the interval by a tolerance epsilon — or rewrite to half-open with explicit casts.
Worked example — BETWEEN with a function on the column kills the index
Detailed explanation. The single most common reason a BETWEEN query is slow on a large table is a function or CAST applied to the indexed column. The optimiser can't see through the function, so it falls back to a full table scan. The fix is either to drop the function (rewrite the query) or to materialise the function as a functional index.
Question. A products(id BIGINT, price NUMERIC(10,2)) table has a B-tree index on price. The query WHERE ROUND(price) BETWEEN 100 AND 200 is slow. Explain why and rewrite it.
Input (relevant table stats).
| Metric | Value |
|---|---|
| Row count | 10,000,000 |
| Index | CREATE INDEX prod_price_idx ON products(price) |
Selectivity of [100, 200]
|
~ 2% of rows |
Code.
-- ❌ Function on the indexed column — full scan
SELECT id, price
FROM products
WHERE ROUND(price) BETWEEN 100 AND 200;
-- ✅ Fix 1 — rewrite to use raw column
SELECT id, price
FROM products
WHERE price >= 99.5 AND price < 200.5;
-- ✅ Fix 2 — build a functional index on ROUND(price)
CREATE INDEX prod_round_price_idx ON products( (ROUND(price)) );
-- Now the original query can use the functional index
Step-by-step explanation.
-
Why the original is slow. The B-tree on
priceindexes the raw values 99.50, 99.51, 99.52, ….ROUND(price)is a computed value; the index has no entry for it. The planner sees no usable index and falls back to a sequential scan of all 10M rows. -
Fix 1 — rewrite the predicate.
ROUND(price) BETWEEN 100 AND 200is the same asprice >= 99.5 AND price < 200.5(becauseROUNDrounds half-up to the nearest integer). Now the predicate is on the raw column; the index is usable; the scan touches ~200K rows instead of 10M. -
Fix 2 — functional index. When the rewrite isn't possible (e.g.
LOWER(name) BETWEEN 'a' AND 'm'can't be expressed withoutLOWER), build an index on the expression itself:CREATE INDEX ... ON products( (ROUND(price)) ). The optimiser now matches the index to the predicate. -
Cost trade-off. Functional indexes carry the same
O(log N)insert cost as regular B-trees; they're slightly larger (the computed value is stored). Use them when the rewrite is impossible or the function appears in many queries.
Output (qualitative cost comparison).
| Form | Plan | Rows touched | Latency (typical) |
|---|---|---|---|
ROUND(price) BETWEEN ... (no functional index) |
Seq Scan | 10,000,000 | 2–5 s |
price >= 99.5 AND price < 200.5 |
Index Range Scan | ~200,000 | 5–20 ms |
ROUND(price) BETWEEN ... (with functional index) |
Index Range Scan | ~200,000 | 5–20 ms |
Rule of thumb. Functions and CASTs on the indexed column are the universal SQL performance footgun. Either keep the function off the column or materialise it as a functional index. Both BETWEEN and half-open inherit this rule; neither is a fix on its own.
SQL interview question on numeric range filtering
The probe usually sounds like: "We have a transactions(amount NUMERIC(12,2)) table and we want every transaction within ±10% of a baseline 100.00. Write the query, and explain why this is safe to do without a CAST."
Solution Using BETWEEN baseline * 0.9 AND baseline * 1.1 over NUMERIC
WITH baseline AS (SELECT 100.00::NUMERIC(12,2) AS amt)
SELECT t.id, t.amount
FROM transactions t, baseline b
WHERE t.amount BETWEEN b.amt * 0.9 AND b.amt * 1.1
ORDER BY t.amount;
Step-by-step trace.
| Step | What happens |
|---|---|
| 1 |
baseline.amt evaluates to 100.00 as NUMERIC(12,2). |
| 2 |
b.amt * 0.9 is computed in NUMERIC arithmetic → 90.00. |
| 3 |
b.amt * 1.1 similarly → 110.00. |
| 4 | The planner rewrites t.amount BETWEEN 90.00 AND 110.00 to t.amount >= 90.00 AND t.amount <= 110.00. |
| 5 | Any B-tree index on t.amount is used as a range scan over [90.00, 110.00]. |
Output:
| id | amount |
|---|---|
| 7 | 90.00 |
| 12 | 92.50 |
| 18 | 100.00 |
| 21 | 105.75 |
| 23 | 110.00 |
Why this works — concept by concept:
-
NUMERIC arithmetic —
NUMERIC * NUMERICis exact; no IEEE-754 rounding.100.00 * 0.9is bit-for-bit90.00, not89.99999999999999. -
Constant-folded bounds — the planner sees the bounds as constants for the duration of the query (after the join unfolds), so the B-tree index on
t.amountis used as a range scan. -
BETWEEN as syntactic sugar — the optimiser unfolds it to
>= AND <=before planning. There is no separate operator at the engine level. - No CAST needed — both sides are already NUMERIC; the engine performs the comparison in fixed-point arithmetic without a single rounding step.
-
Cost —
O(log N + K)where N is the table size and K is the number of matching rows. Identical to writing>= AND <=by hand.
SQL
Topic — aggregation (SQL)
Numeric range and aggregation problems (SQL)
3. Date BETWEEN — the time-truncation pitfall
sql between dates is the #1 production query bug in the entire range-query family
If you only remember one thing from this guide, make it this: created_at BETWEEN '2026-01-01' AND '2026-01-31' does not return "all of January". It returns every row from midnight Jan 1 through midnight Jan 31 — that is, January 1 through January 30 plus exactly the first second of January 31. Every row from 00:00:01 Jan 31 through 23:59:59.999999 Jan 31 is silently dropped.
The reason is simple — the right-hand operand '2026-01-31' is a DATE literal, and when it's compared against a TIMESTAMP column the engine casts it to '2026-01-31 00:00:00'. The comparison created_at <= '2026-01-31 00:00:00' is true for the first second of Jan 31 and false for every later moment that day. The bug is the implicit cast.
Where this surfaces in real systems.
- Daily / monthly revenue reports that miss the last day's revenue. Finance team notices a few percent gap; on-call engineer spends a day finding it.
-
Cohort analyses that drop "people who joined on day 31" because the cohort window was written as
joined BETWEEN cohort_start AND cohort_end. -
Backfill scripts that say
WHERE event_date BETWEEN '2026-01-01' AND '2026-01-31'and silently skip 23h 59m of Jan 31's events. - Audit queries that confirm "no records between dates X and Y" — and miss the records that landed on Y after midnight.
Three fixes — pick one and stay consistent.
-
Fix 1 — Cast the column to DATE.
created_at::date BETWEEN '2026-01-01' AND '2026-01-31'. Conceptually simplest. Performance downside: applies a function to the column on the left side of the comparison, which usually disables B-tree index usage. Acceptable on small tables; a problem on warehouse-scale tables. You can rescue it with a functional index on(created_at::date). -
Fix 2 — Half-open interval (the production default).
created_at >= '2026-01-01' AND created_at < '2026-02-01'. No function on the column; B-tree index oncreated_atis used as a range scan; correct down to the last microsecond. This is what every senior DE writes. -
Fix 3 —
date_trunc('day', …)semantics. Useful when the half-open boundary is dynamic (e.g. "from N days ago to now"):created_at >= date_trunc('day', NOW() - INTERVAL '30 days') AND created_at < date_trunc('day', NOW() + INTERVAL '1 day'). Same B-tree behaviour, expressive endpoints.
TIMESTAMP vs TIMESTAMPTZ in BETWEEN.
- TIMESTAMP (without time zone). The comparison happens at face value. If your column is naïve TIMESTAMP and your endpoints are naïve DATE literals, the engine casts the DATE to midnight in the same naïve clock — the pitfall above applies as described.
-
TIMESTAMPTZ (with time zone). The column is stored in UTC; literals are converted to UTC at parse time, based on the session's
timezonesetting. A'2026-01-31'in a session whose timezone isAmerica/New_Yorkbecomes2026-01-31 05:00:00+00in UTC — and your BETWEEN now drops the last 24 hours of the New York day. The pitfall is amplified: you need to be both date-aware and timezone-aware.
The safest pattern when timezones are in play: write the boundaries explicitly as timestamps with timezone, e.g. created_at >= '2026-01-01 00:00:00-05:00' AND created_at < '2026-02-01 00:00:00-05:00', or use date_trunc('day', created_at AT TIME ZONE 'America/New_York') — which is verbose but unambiguous.
Worked example — show the date BETWEEN bug and the half-open fix
Detailed explanation. A logins(user_id, login_at TIMESTAMP) table tracks user logins. We want every login in January 2026. The naïve query uses BETWEEN '2026-01-01' AND '2026-01-31'. We then rewrite it to half-open and observe the difference.
Question. With the rows below, return all January-2026 logins, then compare the BETWEEN query's result with the half-open rewrite and explain the gap.
Input.
| user_id | login_at |
|---|---|
| 1 | 2026-01-01 00:00:00 |
| 2 | 2026-01-15 12:30:00 |
| 3 | 2026-01-30 23:59:59 |
| 4 | 2026-01-31 00:00:00 |
| 5 | 2026-01-31 09:15:42 |
| 6 | 2026-01-31 23:59:59 |
| 7 | 2026-02-01 00:00:00 |
| 8 | 2026-02-01 09:00:00 |
Code.
-- Naïve BETWEEN — the bug
SELECT user_id, login_at
FROM logins
WHERE login_at BETWEEN '2026-01-01' AND '2026-01-31'
ORDER BY login_at;
-- Half-open rewrite — the fix
SELECT user_id, login_at
FROM logins
WHERE login_at >= '2026-01-01'
AND login_at < '2026-02-01'
ORDER BY login_at;
Step-by-step explanation.
-
Naïve BETWEEN execution. The engine unfolds to
login_at >= '2026-01-01 00:00:00' AND login_at <= '2026-01-31 00:00:00'. The right-hand bound is exactly midnight on Jan 31. -
Boundary at Jan 31 00:00:00. Row 4 (
login_at = 2026-01-31 00:00:00) satisfies<= '2026-01-31 00:00:00'— it passes by a single second. -
Boundary at Jan 31 09:15:42 and 23:59:59. Rows 5 and 6 are after midnight on Jan 31.
'2026-01-31 09:15:42' <= '2026-01-31 00:00:00'is false. They are silently dropped. -
Half-open rewrite. The upper bound becomes
< '2026-02-01'— equivalent to< '2026-02-01 00:00:00'. Every Jan 31 timestamp now passes (anything strictly before Feb 1 midnight). Rows 4, 5, 6 are all included. Rows 7 and 8 (Feb 1) are correctly excluded. -
The 23-hour gap. The BETWEEN query missed two of the three Jan-31 records. In a high-traffic production system this is roughly
~ traffic_per_dayof missing rows on every end-of-month report.
Output.
| Query | Rows returned (user_ids) | Missed |
|---|---|---|
BETWEEN '2026-01-01' AND '2026-01-31' |
1, 2, 3, 4 | 5, 6 (silently dropped) |
>= '2026-01-01' AND < '2026-02-01' |
1, 2, 3, 4, 5, 6 | none — fully correct |
Rule of thumb. When the column is a TIMESTAMP or TIMESTAMPTZ and you're filtering by calendar dates, never write BETWEEN. Write col >= start AND col < end_plus_one_day. The half-open pattern is the only one that is correct independent of the column's time resolution.
Worked example — TIMESTAMPTZ + session timezone amplifies the pitfall
Detailed explanation. A finance team in New York runs a "January revenue" report. The column is TIMESTAMPTZ; the session timezone is America/New_York. The query uses BETWEEN '2026-01-01' AND '2026-01-31'. The report comes in noticeably under the marketing team's projection — by both the missing-Jan-31 records and a five-hour timezone shift that pushes some early-Jan records into late-Dec UTC.
Question. With orders(placed_at TIMESTAMPTZ) and SET timezone = 'America/New_York', show how BETWEEN '2026-01-01' AND '2026-01-31' actually translates internally, and rewrite for a correct New-York-local "January" window.
Input (relevant rows).
| order_id | placed_at (in UTC) | placed_at (in America/New_York) |
|---|---|---|
| A | 2026-01-01 02:00:00+00 | 2025-12-31 21:00:00 (NY) — late December |
| B | 2026-01-01 12:00:00+00 | 2026-01-01 07:00:00 (NY) — Jan 1 morning |
| C | 2026-01-31 12:00:00+00 | 2026-01-31 07:00:00 (NY) — Jan 31 morning |
| D | 2026-01-31 23:00:00+00 | 2026-01-31 18:00:00 (NY) — Jan 31 evening |
| E | 2026-02-01 04:00:00+00 | 2026-01-31 23:00:00 (NY) — late Jan 31 |
Code.
SET timezone = 'America/New_York';
-- ❌ Naïve — dates interpreted in session TZ, then converted to UTC for compare
SELECT order_id, placed_at
FROM orders
WHERE placed_at BETWEEN '2026-01-01' AND '2026-01-31';
-- ✅ Fix — fully-qualified timestamps anchor to New-York-local
SELECT order_id, placed_at
FROM orders
WHERE placed_at >= '2026-01-01 00:00:00-05:00'
AND placed_at < '2026-02-01 00:00:00-05:00';
-- ✅ Alternative — date_trunc with AT TIME ZONE for self-documenting code
SELECT order_id, placed_at
FROM orders
WHERE (placed_at AT TIME ZONE 'America/New_York') >= '2026-01-01'
AND (placed_at AT TIME ZONE 'America/New_York') < '2026-02-01';
Step-by-step explanation.
-
What the naïve query becomes. With session TZ =
America/New_York,'2026-01-01'is parsed as2026-01-01 00:00:00 America/New_York, which is2026-01-01 05:00:00+00UTC. Similarly'2026-01-31'becomes2026-01-31 05:00:00+00. So the filter isplaced_at BETWEEN '2026-01-01 05:00:00+00' AND '2026-01-31 05:00:00+00'. -
Row A — UTC 02:00 Jan 1. In NY this is Dec 31; the user wants it excluded from "January NY revenue." The filter
>= '2026-01-01 05:00:00+00'excludes it (row A's02:00 < 05:00). Correct. - Row B — UTC 12:00 Jan 1. NY local: Jan 1 morning. Included. Correct.
-
Row C — UTC 12:00 Jan 31. NY local: Jan 31 morning. The filter
<= '2026-01-31 05:00:00+00'excludes it because12:00 > 05:00. Silently dropped — bug. - Row D — UTC 23:00 Jan 31. NY local: Jan 31 evening. Silently dropped.
- Row E — UTC 04:00 Feb 1. NY local: Jan 31 23:00. The "real" January (NY-local) should include this. Silently dropped.
-
The fix. Fully-qualified
'2026-01-01 00:00:00-05:00'and'2026-02-01 00:00:00-05:00'pin the bounds to midnight NY-local. Rows B, C, D, E all pass; row A correctly fails. The half-open form also avoids the closed-closed pitfall.
Output (count comparison).
| Form | Rows returned (order_ids) | Bug? |
|---|---|---|
BETWEEN '2026-01-01' AND '2026-01-31' (session TZ NY) |
B only | yes — C, D, E silently dropped |
Fully-qualified >= …-05:00 AND < …-05:00
|
B, C, D, E | none |
Rule of thumb. With TIMESTAMPTZ columns, never let date literals be interpreted in the session's timezone implicitly. Either anchor the bounds to fully-qualified TIMESTAMP WITH TIME ZONE literals, or convert the column to a local timestamp via AT TIME ZONE and compare against naïve date literals — both forms are explicit and reproducible across sessions.
SQL interview question on date range filtering
A common probe: "We want all orders placed in January 2026, where orders.placed_at is TIMESTAMP. Write the most correct, most index-friendly query. Then defend why you didn't use BETWEEN."
Solution Using a half-open interval >= '2026-01-01' AND < '2026-02-01'
SELECT order_id, placed_at, customer_id, amount
FROM orders
WHERE placed_at >= '2026-01-01'
AND placed_at < '2026-02-01'
ORDER BY placed_at;
Step-by-step trace.
| Step | Action | Effect |
|---|---|---|
| 1 | Parse '2026-01-01' and '2026-02-01' as DATE literals |
both implicit-cast to TIMESTAMP at 00:00:00 on the respective day |
| 2 | Plan with B-tree index on placed_at
|
range scan over [2026-01-01 00:00:00, 2026-02-01 00:00:00)
|
| 3 | Scan leaf pages for placed_at keys in the range |
every Jan-31 timestamp ≤ 23:59:59.999999 passes |
| 4 | Return matching rows in index order | results are time-sorted with no extra sort cost |
| 5 | Caller can replace the lower bound with date_trunc('month', NOW()) for "this month" reports — same shape, dynamic boundary |
Output:
| order_id | placed_at | customer_id | amount |
|---|---|---|---|
| 8801 | 2026-01-01 00:00:00 | 42 | 120.00 |
| 8802 | 2026-01-15 12:30:00 | 17 | 89.50 |
| 8803 | 2026-01-30 23:59:59 | 42 | 24.99 |
| 8804 | 2026-01-31 00:00:00 | 7 | 41.20 |
| 8805 | 2026-01-31 09:15:42 | 17 | 75.00 |
| 8806 | 2026-01-31 23:59:59 | 42 | 18.75 |
Why this works — concept by concept:
-
Half-open semantics —
[start, end)includes every microsecond fromstartthroughend - 1µs. Jan 31 in full, Feb 1 not at all. -
No function on the column — the comparison
placed_at >= '2026-01-01'is a direct B-tree key lookup; the optimiser uses the index as a range scan. Wrapping the column indate_trunc(...)or::datewould have disabled the index unless a functional index existed. -
Time-grain independence — the same query is correct whether
placed_atisTIMESTAMP,TIMESTAMPTZ, or evenTIMESTAMP(6)with microsecond precision. The half-open shape doesn't care. -
Composability for monthly buckets — back-to-back monthly windows
[Jan, Feb),[Feb, Mar)partition the year into non-overlapping, non-gapping intervals. Sum-by-month reports don't double-count and don't drop records. -
Cost —
O(log N + K)B-tree range scan. Identical to the brokenBETWEENform — the correctness difference is everything, the performance difference is zero.
SQL
Topic — date functions (SQL)
Date range and date-function problems (SQL)
4. Half-open intervals — the production pattern
[start, end) is the only interval notation that composes cleanly across SQL, Python, and pandas
Half-open intervals — closed on the lower end, open on the upper end — are the production default for one structural reason: adjacent half-open intervals don't overlap and don't gap. [Jan, Feb) ends just before Feb 1 00:00:00; [Feb, Mar) begins exactly at Feb 1 00:00:00. Every moment in time belongs to exactly one interval. This is the property that makes monthly cohort buckets, daily partition windows, and rolling 7-day aggregates summable.
![Visual diagram of half-open intervals — top a number line shows two back-to-back intervals [Jan, Feb) and Feb, Mar) without overlap; bottom a four-pattern matrix shows closed/open combinations [a,b], [a,b), (a,b], (a,b); a small OVERLAPS operator + tstzrange chip; on a light PipeCode card.
The four-quadrant interval notation.
-
[a, b]— closed-closed. What SQLBETWEENdoes. Both endpoints inclusive. Adjacent intervals double-count the shared boundary; consecutive[Jan, Feb-1],[Feb, Feb+27]triggers a "where does Feb 1 belong?" headache. -
[a, b)— closed-open. Lower inclusive, upper exclusive. The production default.a >= x AND x < b. Adjacent intervals compose cleanly. -
(a, b]— open-closed. Upper inclusive, lower exclusive. The mirror image; less common. Sometimes natural for "cumulative-up-to-and-including" buckets. -
(a, b)— open-open. Both endpoints excluded. Almost never the right tool; reserved for "strictly inside" mathematical statements.
Why half-open also wins for off-by-one safety.
-
Length is
b - a. A half-open interval over integers contains exactlyb - aintegers; over time, exactlyb - aof whatever time unit you're using. The closed-closed form over integers containsb - a + 1— the+ 1is where every off-by-one bug lives. -
Empty interval is
[a, a). Half-open intervals have a clean empty form; closed-closed has the awkward[a, a]which contains exactly one point. -
Slicing matches array/Python conventions. Python's
range(a, b)and slicingarr[a:b]are half-open. Half-open SQL aligns with the language your downstream code is written in.
ANSI SQL OVERLAPS — interval-vs-interval overlap testing.
- Postgres and Oracle support the
OVERLAPSoperator with the signature(start1, end1) OVERLAPS (start2, end2). It returns true iff the two intervals share at least one moment. - Equivalent expansion:
start1 < end2 AND start2 < end1— the standard interval overlap check. -
OVERLAPSis half-open by definition in the SQL standard — back-to-back(Jan, Feb)and(Feb, Mar)do not overlap. (Which is exactly the property you want.)
Postgres tstzrange — first-class interval columns.
- Range types (
int4range,numrange,daterange,tsrange,tstzrange) let you store an interval as a single value. Literal syntax'[a, b)'includes the closed/open markers in the value itself. - Operators:
@>(contains),&&(overlaps),-|-(is adjacent to). Index with GiST (USING gist (range_col)). - Useful for booking systems, availability calendars, time-bounded permissions — anywhere a row's natural payload is an interval.
Worked example — back-to-back monthly buckets without double-counting
Detailed explanation. A reporting team wants monthly revenue buckets for Jan, Feb, Mar 2026. Naïve BETWEEN over date ranges double-counts every month-boundary record and misses end-of-month records. Half-open buckets avoid both bugs.
Question. Compute revenue for Jan, Feb, Mar 2026 from orders(placed_at TIMESTAMP, amount NUMERIC) such that every order is counted exactly once and end-of-month orders are not dropped.
Input.
| order_id | placed_at | amount |
|---|---|---|
| 1 | 2026-01-15 10:00:00 | 100.00 |
| 2 | 2026-01-31 23:59:59 | 50.00 |
| 3 | 2026-02-01 00:00:00 | 75.00 |
| 4 | 2026-02-15 14:00:00 | 80.00 |
| 5 | 2026-02-28 23:59:59 | 30.00 |
| 6 | 2026-03-01 00:00:00 | 95.00 |
| 7 | 2026-03-15 09:00:00 | 60.00 |
Code.
SELECT
CASE
WHEN placed_at >= '2026-01-01' AND placed_at < '2026-02-01' THEN 'Jan'
WHEN placed_at >= '2026-02-01' AND placed_at < '2026-03-01' THEN 'Feb'
WHEN placed_at >= '2026-03-01' AND placed_at < '2026-04-01' THEN 'Mar'
END AS month_bucket,
SUM(amount) AS revenue
FROM orders
WHERE placed_at >= '2026-01-01' AND placed_at < '2026-04-01'
GROUP BY 1
ORDER BY 1;
Step-by-step explanation.
-
Order 2 — Jan 31 23:59:59. Falls in
[Jan, Feb) = [2026-01-01, 2026-02-01)because'2026-01-31 23:59:59' < '2026-02-01'is true. Counted in Jan. -
Order 3 — Feb 1 00:00:00.
'2026-02-01 00:00:00' < '2026-02-01'is false, so it fails the Jan bucket.'2026-02-01 00:00:00' >= '2026-02-01'is true, so it falls into the Feb bucket. Counted in Feb, not Jan. - Order 5 — Feb 28 23:59:59. Strictly before Mar 1 00:00:00, so it's in Feb.
- Order 6 — Mar 1 00:00:00. Belongs to Mar.
-
No double-count. Every order is assigned to exactly one bucket because the half-open intervals partition time without overlap. If we had used
BETWEEN(closed-closed) on the same boundaries, orders 3 and 6 would each belong to two buckets — and the report would silently inflate revenue.
Output.
| month_bucket | revenue |
|---|---|
| Jan | 150.00 |
| Feb | 185.00 |
| Mar | 155.00 |
Rule of thumb. When you bucket time into N consecutive intervals, always use half-open. Closed-closed buckets are a recipe for silent double-counting at every interval boundary.
Worked example — Postgres tstzrange for first-class interval columns
Detailed explanation. When a row's natural payload is an interval — a booking, a permission grant, a price-validity window — store it as a tstzrange instead of two columns. The closed/open marker lives in the value ('[)' is half-open, '[]' is closed-closed). Range operators (@>, &&, -|-) read more cleanly than the equivalent start1 < end2 AND … arithmetic, and a GiST index supports them all.
Question. A room_bookings(booking_id BIGINT, room_id INT, period tstzrange) table stores each booking as a single tstzrange with '[)' half-open semantics. A new request comes in for room_id=7 over [2026-03-10 14:00, 2026-03-10 16:00). Write the conflict query and explain how the index supports it.
Input.
| booking_id | room_id | period |
|---|---|---|
| 101 | 7 | [2026-03-10 10:00, 2026-03-10 12:00) |
| 102 | 7 | [2026-03-10 13:00, 2026-03-10 15:00) |
| 103 | 7 | [2026-03-10 15:30, 2026-03-10 17:00) |
| 104 | 7 | [2026-03-10 16:00, 2026-03-10 18:00) |
| 105 | 7 | [2026-03-11 09:00, 2026-03-11 10:00) |
Code.
-- Schema
CREATE TABLE room_bookings (
booking_id BIGSERIAL PRIMARY KEY,
room_id INT NOT NULL,
period TSTZRANGE NOT NULL,
EXCLUDE USING gist (room_id WITH =, period WITH &&)
);
-- Conflict query using the && (overlap) operator
SELECT booking_id, period
FROM room_bookings
WHERE room_id = 7
AND period && tstzrange('2026-03-10 14:00+00',
'2026-03-10 16:00+00',
'[)');
Step-by-step explanation.
-
Schema.
period TSTZRANGEstores both endpoints + the inclusion markers in one column. TheEXCLUDE USING gistconstraint prevents overlapping bookings for the same room at insert time — the database itself enforces the business rule. -
The
&&operator.a && bis true iff intervalsaandboverlap by at least one moment. Half-open semantics are respected —[10:00, 12:00)and[12:00, 14:00)do not overlap. -
Booking 101.
[10:00, 12:00)vs request[14:00, 16:00). No overlap. Skipped. -
Booking 102.
[13:00, 15:00)vs request[14:00, 16:00). Overlaps on[14:00, 15:00). Conflict. -
Booking 103.
[15:30, 17:00)vs request[14:00, 16:00). Overlaps on[15:30, 16:00). Conflict. -
Booking 104.
[16:00, 18:00)vs request[14:00, 16:00). Back-to-back; shared boundary 16:00 belongs to the later interval. No overlap. - Booking 105. Different day, no overlap.
Output.
| booking_id | period |
|---|---|
| 102 | [2026-03-10 13:00, 2026-03-10 15:00) |
| 103 | [2026-03-10 15:30, 2026-03-10 17:00) |
Rule of thumb. When the row models an interval (booking, permission, pricing window), use tstzrange over two-columns; pair with EXCLUDE USING gist to let the database refuse to insert conflicting rows. It's the cleanest interval API in any SQL dialect.
SQL interview question on interval overlap testing
A senior probe: "We have a bookings(room_id, starts_at TIMESTAMP, ends_at TIMESTAMP) table. A new request comes in for (room_id, request_start, request_end). Write a query that returns conflicting bookings for that room."
Solution Using ANSI OVERLAPS (or the equivalent start1 < end2 AND start2 < end1)
-- Postgres / Oracle — ANSI OVERLAPS operator
SELECT booking_id, starts_at, ends_at
FROM bookings
WHERE room_id = :room_id
AND (starts_at, ends_at) OVERLAPS (:request_start, :request_end);
-- Portable rewrite (works in every dialect)
SELECT booking_id, starts_at, ends_at
FROM bookings
WHERE room_id = :room_id
AND starts_at < :request_end
AND :request_start < ends_at;
Step-by-step trace.
Suppose :request_start = '2026-03-10 14:00', :request_end = '2026-03-10 16:00', room_id = 7:
| booking_id | starts_at | ends_at | starts_at < request_end? | request_start < ends_at? | Overlap? |
|---|---|---|---|---|---|
| 101 | 2026-03-10 10:00 | 2026-03-10 12:00 | 10:00 < 16:00 → true | 14:00 < 12:00 → false | no |
| 102 | 2026-03-10 13:00 | 2026-03-10 15:00 | 13:00 < 16:00 → true | 14:00 < 15:00 → true | yes |
| 103 | 2026-03-10 15:30 | 2026-03-10 17:00 | 15:30 < 16:00 → true | 14:00 < 17:00 → true | yes |
| 104 | 2026-03-10 16:00 | 2026-03-10 18:00 | 16:00 < 16:00 → false | 14:00 < 18:00 → true | no — back-to-back, half-open |
| 105 | 2026-03-11 09:00 | 2026-03-11 10:00 | 09:00 < 16:00 → false (next day) | 14:00 < 10:00 → false | no |
Output:
| booking_id | starts_at | ends_at |
|---|---|---|
| 102 | 2026-03-10 13:00 | 2026-03-10 15:00 |
| 103 | 2026-03-10 15:30 | 2026-03-10 17:00 |
Why this works — concept by concept:
-
Interval overlap identity — two half-open intervals
[s1, e1)and[s2, e2)overlap iffs1 < e2 AND s2 < e1. The condition is symmetric, branchless, and handles every case (containment, partial overlap, identical, back-to-back). -
Back-to-back is not an overlap — booking 104 (
16:00 → 18:00) does not overlap with the request ending at16:00. Half-open intervals treat the shared boundary as belonging to the later interval — exactly what a room scheduler wants. -
ANSI
OVERLAPS— Postgres and Oracle expose this as a built-in operator; SQL Server and MySQL require the manual<rewrite. Both forms compile to the same plan. -
Indexable — with
(room_id, starts_at)as a composite B-tree index, theroom_id = ?predicate is an equality seek and thestarts_at < ?is a range scan suffix. Add(room_id, ends_at)ifends_atbecomes the more selective side. -
Cost —
O(log N + K)per query; cheap enough to run on every booking-request POST.
SQL
Topic — filtering (SQL)
Interval, overlap, and range-filter problems (SQL)
5. BETWEEN performance — indexes, BRIN, partition pruning
sql range query performance is decided by three index strategies — B-tree, BRIN, and partition pruning
A BETWEEN (or its half-open equivalent) is only as fast as the index strategy the optimiser can apply to the column. The three strategies that actually matter for production-scale range queries are B-tree range scans (the default), BRIN block-range indexes (for time-ordered append-only data in Postgres), and partition pruning (for declaratively partitioned tables). Pick by table size and access pattern; combine when the data set crosses TB-scale.
Strategy 1 — B-tree range scan (the default).
- The planner descends the tree to the first key in the range, then scans contiguous leaf pages until it passes the upper bound.
- Cost:
O(log N + K)— logarithmic descent plus linear scan of matching keys. - Works on any column type with a total order —
INT,NUMERIC,DATE,TIMESTAMP,VARCHAR,BYTEA. - The default index in every dialect; no setup beyond
CREATE INDEX ... ON table(col). - Caveat: a function applied to the column (
WHERE date_trunc('day', created_at) BETWEEN ...) hides the column from the optimiser and forces a full scan. Either drop the function or build a functional index.
Strategy 2 — BRIN (Block Range Index) in Postgres.
- A BRIN stores one (min, max) summary per block range (default 128 pages, ~1MB on disk). At query time, the planner skips any block range whose summary doesn't intersect the WHERE predicate.
- Cost:
O(blocks_examined). For naturally time-ordered data (every insert is "now"), the index is tiny — often 1000× smaller than the equivalent B-tree. - The fit: append-only, time-ordered data, large tables. Classic workload — IoT, log shipping, metrics, event streams persisted in Postgres.
- The anti-fit: heavily-updated columns, columns not correlated with physical order, OLTP point lookups. BRIN excels at "scan a recent window of a huge table"; it's terrible at "find a specific row by id."
Strategy 3 — Partition pruning.
- Declaratively partition the table by date (or another low-cardinality range key) —
CREATE TABLE events ... PARTITION BY RANGE (event_date)in Postgres;PARTITION BY (date_trunc('month', event_ts))analogues in BigQuery, Snowflake, Redshift. - The planner inspects the WHERE clause and only reads the partitions whose range overlaps the predicate.
- Cost:
O(partitions_read × cost_per_partition). The 11 untouched months never load a page. - Works with cloud warehouses out of the box: BigQuery and Snowflake auto-prune by partition column without an explicit index.
When BETWEEN is the wrong choice for performance.
-
Function on the column —
WHERE LOWER(name) BETWEEN 'a' AND 'm'disables the B-tree onname. Fix: rewrite to half-open, or build a functional index. -
CAST on the column —
WHERE created_at::date BETWEEN '2026-01-01' AND '2026-01-31'disables the B-tree oncreated_at. Fix: rewrite to half-open without the CAST. -
OR across non-contiguous ranges —
WHERE x BETWEEN 1 AND 10 OR x BETWEEN 100 AND 110. The planner often falls back to a full scan. Fix:WHERE x IN (range1) OR x IN (range2)if discrete, or use a UNION ALL of two range queries. -
LIKE with leading wildcard —
WHERE name LIKE '%foo%'is a different beast but lives in the same "no index usable" family. Mentioned here for completeness; covered fully in the dedicatedLIKEinterview blog.
Combining strategies — the multi-TB pattern.
The pattern senior data engineers actually deploy on giant time-series tables is partition pruning + BRIN on the partition column (or B-tree on a more selective column inside the partition). The partition prune drops 95%+ of the table; the BRIN inside each surviving partition drops most of the remaining blocks; the final B-tree (e.g. on customer_id) finds the specific rows. Three layers of selectivity, each cheap.
Worked example — explain BRIN vs B-tree for a 1-billion-row events table
Detailed explanation. A Postgres table events(event_ts TIMESTAMP, event_type TEXT, user_id BIGINT, payload JSONB) has 1 billion rows; new events are inserted in order of event_ts (append-only). We need WHERE event_ts >= NOW() - INTERVAL '7 days' AND event_ts < NOW() to be fast.
Question. Compare a B-tree on event_ts to a BRIN on event_ts for this workload. Which one would you pick and why?
Input (relevant table stats).
| Metric | Value |
|---|---|
| Row count | 1,000,000,000 |
| Disk size | ~ 600 GB |
| 7-day window row count | ~ 25,000,000 (about 2.5% of total) |
| Physical insert order | strictly monotonic by event_ts
|
Code.
-- Option A: B-tree on event_ts
CREATE INDEX events_event_ts_btree
ON events (event_ts);
-- Option B: BRIN on event_ts
CREATE INDEX events_event_ts_brin
ON events USING BRIN (event_ts) WITH (pages_per_range = 128);
-- The query (same for both indexes)
SELECT event_type, COUNT(*)
FROM events
WHERE event_ts >= NOW() - INTERVAL '7 days'
AND event_ts < NOW()
GROUP BY event_type
ORDER BY COUNT(*) DESC;
Step-by-step explanation.
-
B-tree size. A B-tree on 1B rows of TIMESTAMP is roughly
~ 1B × 24 bytes(key + pointer overhead) = ~ 25 GB. Big, but acceptable. -
BRIN size. With
pages_per_range=128and the table at ~75M pages, the BRIN has75M / 128 ≈ 590Kblock ranges; each summary is ~20 bytes. Total BRIN size ≈ 12 MB. Three orders of magnitude smaller than the B-tree. -
Query against B-tree. Descend the tree to the first key ≥
NOW() - 7d, then scan leaf pages until pastNOW(). Touches ~25M index entries plus the corresponding heap pages. Cost: I/O ≈ 25M random-ish heap fetches. -
Query against BRIN. Inspect each of the ~590K block-range summaries, keep the ones whose
[min_ts, max_ts]overlap the 7-day window. Because the table is monotonically inserted, only ~7 / 365 × 590K ≈ 11Kblock ranges are kept → ~1.4M pages → ~ 25M rows scanned sequentially. The crucial difference: sequential I/O, not random. -
Why BRIN wins here. Sequential I/O is 10–30× faster than random I/O on spinning disks and 3–10× faster on SSDs. For "recent window of a huge, append-only table," BRIN is the right answer. For "find a specific event by id," B-tree on
idis. -
Caveat — when BRIN regresses. If the physical insert order drifts away from time order (e.g. a backfill prepends old data), BRIN block-range summaries become loose and the index degenerates to a full scan. Use
pg_summarize_brin_ranges()andREINDEXto keep it healthy.
Output (qualitative cost comparison).
| Aspect | B-tree on event_ts | BRIN on event_ts |
|---|---|---|
| Index size | ~ 25 GB | ~ 12 MB |
| Insert overhead per row | tree rebalancing | append summary block |
| Query I/O pattern | random-ish heap fetches | sequential block ranges |
| 7-day query latency (typical) | 30–120s | 3–8s |
| Best for | point lookups + small windows | recent window of large append-only tables |
Rule of thumb. For time-ordered append-only Postgres tables larger than ~ 100M rows, always evaluate BRIN before B-tree for the timestamp column. The 1000× size saving and the sequential I/O pattern together turn what used to be a 2-minute query into a 5-second query.
Worked example — composite index for (room_id, period) range queries
Detailed explanation. When a range query also filters by a low-cardinality equality column (room_id, tenant_id, customer_id), the textbook strategy is a composite index with the equality column first and the range column second. The optimiser uses the equality column as an index seek, then the range column as a contiguous suffix scan — cost stays O(log N + K) and only the matching equality bucket is touched.
Question. With bookings(room_id INT, starts_at TIMESTAMP, ends_at TIMESTAMP, …) and the typical query WHERE room_id = ? AND starts_at >= ? AND starts_at < ?, design the optimal composite index and trace the plan.
Input (relevant table stats).
| Metric | Value |
|---|---|
| Row count | 50,000,000 |
Distinct room_id
|
5,000 |
Average rows per room_id
|
~ 10,000 |
Selectivity of a 1-day starts_at window |
~ 0.3% within a room |
Code.
-- Composite B-tree, equality column FIRST
CREATE INDEX bookings_room_starts_idx
ON bookings (room_id, starts_at);
-- The query
SELECT booking_id, starts_at, ends_at
FROM bookings
WHERE room_id = 7
AND starts_at >= '2026-03-10'
AND starts_at < '2026-03-11';
Step-by-step explanation.
-
Why
room_idfirst. The optimiser uses each leading column as an equality seek; only the trailing column can be a range. With(room_id, starts_at), an exact match onroom_idnarrows the index to one contiguous slice; the range onstarts_atthen scans within that slice. -
Why not
(starts_at, room_id). Reversing the order makesstarts_atthe leading column, which is the range. The optimiser still uses the index — but it now has to scan every key in the time window across every room (5,000× more pages), then filter forroom_id=7after the fact. -
Plan trace (correct order). Seek to
(7, '2026-03-10')→ scan contiguous keys until(7, '2026-03-11')→ ~30 matching rows → done in ~5–10 µs. -
Plan trace (wrong order). Seek to
('2026-03-10', *)→ scan ~150K keys (entire day across 5000 rooms) → filter toroom_id=7→ ~30 matching rows in ~50–100 ms. 10,000× more I/O for the same answer. -
The general rule. Equality columns first, range column last in the composite index. The leftmost-prefix rule means the index also covers queries that filter on
room_idalone (e.g. "all bookings for room 7 across all time") — a bonus you don't get from(starts_at, room_id).
Output (qualitative trace).
| Index design | Plan | Rows touched | Latency |
|---|---|---|---|
(room_id, starts_at) |
Index Seek + Range | ~30 | ~5 µs |
(starts_at, room_id) |
Index Range + Filter | ~150,000 | ~50 ms |
(room_id) only |
Index Seek + heap filter | ~10,000 | ~5 ms |
| no index | Seq Scan | ~50,000,000 | ~30 s |
Rule of thumb. For mixed equality + range predicates, the composite index is (equality_cols…, range_col). The equality columns can be any order among themselves; the range column always last. If you have two range columns, only the first one in the composite is index-friendly — the second has to be applied as a post-filter.
Worked example — read an EXPLAIN ANALYZE for a BETWEEN range query
Detailed explanation. The fastest way to confirm that a range query is doing what you think is to read its plan with EXPLAIN (ANALYZE, BUFFERS). The three signals to look for are the node type (Index Scan good, Seq Scan bad without a reason), the Index Cond (the BETWEEN expression should appear here, not in Filter), and the rows planned vs actual divergence (a >10× miss means the statistics are stale).
Question. Given the plan output below for WHERE created_at BETWEEN '2026-05-01' AND '2026-05-07' on a Postgres events table with index events_created_at_idx, identify whether the query is using the index correctly and what to fix if not.
Input (the plan output).
QUERY PLAN
-----------------------------------------------------------------------------
Aggregate (cost=12340.12..12340.13 rows=1 width=8)
(actual time=84.121..84.121 rows=1 loops=1)
Buffers: shared hit=4123 read=58
-> Index Scan using events_created_at_idx on events
(cost=0.43..12134.65 rows=82188 width=4)
(actual time=0.018..70.301 rows=81342 loops=1)
Index Cond: ((created_at >= '2026-05-01 00:00:00'::timestamp)
AND (created_at <= '2026-05-07 00:00:00'::timestamp))
Buffers: shared hit=4123 read=58
Planning Time: 0.121 ms
Execution Time: 84.158 ms
Code (how to run the analysis yourself).
EXPLAIN (ANALYZE, BUFFERS)
SELECT COUNT(*)
FROM events
WHERE created_at BETWEEN '2026-05-01' AND '2026-05-07';
Step-by-step explanation.
-
Node type —
Index Scan using events_created_at_idx. Good. The optimiser picked the B-tree oncreated_atand used it as a range scan. If this saidSeq Scan, the column has either no index, a function on the column, or the planner thought a scan was cheaper than the index (statistics issue). -
Index Cond vs Filter. The BETWEEN appears under
Index Cond— meaning the index narrowed the scan to the matching keys before any rows were fetched. If it appeared underFilter, the index would have been used for some other column and the BETWEEN would be a post-filter (much more I/O). -
Planned vs actual rows —
82188 vs 81342. Within 1%; statistics are fresh. A 10×+ miss here would suggest runningANALYZE eventsto refresh the planner's row estimates. -
Buffers —
hit=4123 read=58. Most of the index + heap pages were already in the shared buffer cache; 58 pages came from disk. A highreadcount on repeated runs is a sign the table is too cold for the cache; consider preloading or scaling the cache. -
The time-truncation pitfall is still present. The
Index Condshows<= '2026-05-07 00:00:00'— confirming the BETWEEN dropped 23h 59m of May 7. The index is doing its job perfectly; the query is just wrong. Fixing the query to>= '2026-05-01' AND < '2026-05-08'preserves the index plan and corrects the boundary.
Output — the diagnostic table.
| Signal | Value (this plan) | Interpretation |
|---|---|---|
| Node type | Index Scan |
Good — index is being used |
| Index name | events_created_at_idx |
The B-tree on created_at
|
| Index Cond contains BETWEEN? | yes | Index is narrowing the scan |
| Rows planned vs actual | 82188 vs 81342 | Statistics fresh (within 1%) |
| Buffers cache hit ratio | 4123 / 4181 = 98.6% | Working set is in cache |
| Time-truncation pitfall in Index Cond? | yes — <= '2026-05-07 00:00:00'
|
Query has the date BETWEEN bug |
Rule of thumb. For every range query that matters, run EXPLAIN (ANALYZE, BUFFERS) once. If you see Index Scan with the BETWEEN under Index Cond, the optimiser is doing its part — but verify the boundary is correct independently. The plan won't tell you that the upper bound is silently dropping 23 hours of data.
SQL interview question on partition pruning + range filter
A staff-level probe: "Our events table is 8 TB, partitioned by event_date daily. The analytics team writes WHERE date_trunc('day', event_ts) BETWEEN '2026-05-01' AND '2026-05-07'. Their dashboards are slow. What's wrong, and what would you rewrite to?"
Solution Using a half-open range without date_trunc on the column
-- ❌ The problem — date_trunc on the column disables partition pruning
SELECT event_type, COUNT(*)
FROM events
WHERE date_trunc('day', event_ts) BETWEEN '2026-05-01' AND '2026-05-07'
GROUP BY event_type;
-- ✅ The fix — half-open range on the raw partition key
SELECT event_type, COUNT(*)
FROM events
WHERE event_ts >= '2026-05-01'
AND event_ts < '2026-05-08'
GROUP BY event_type;
Step-by-step trace.
| Step | What happens (broken query) | What happens (fixed query) |
|---|---|---|
| 1 | Planner sees date_trunc('day', event_ts)
|
Planner sees event_ts >= ? AND event_ts < ?
|
| 2 | Cannot match expression to partition key | Matches partition key (event_ts) directly |
| 3 | Falls back to scanning every daily partition (potentially years of data) | Prunes to exactly the 7 daily partitions for May 1–7 |
| 4 | I/O = full table read (~ 8 TB) | I/O = 7 daily partitions (~ 7 × 22 GB) |
| 5 | Latency = 5–10 minutes | Latency = 5–15 seconds |
| 6 | The BETWEEN endpoint '2026-05-07' would also drop 23h 59m of May 7 in the broken form even after pruning — the same time-truncation pitfall reappears at the application layer |
Half-open < '2026-05-08' captures all of May 7 |
Output:
| Query form | Partitions scanned | I/O | p95 latency |
|---|---|---|---|
date_trunc(event_ts) BETWEEN … |
all (no pruning) | ~ 8 TB | 5–10 min |
event_ts >= … AND event_ts < … |
7 (pruned) | ~ 154 GB | 5–15 s |
Why this works — concept by concept:
-
Partition pruning depends on a literal predicate on the partition key — the planner can match
event_ts >= constantto the partition'sRANGE (event_ts)spec at plan time.date_trunc('day', event_ts)is opaque — the planner has no rule that says "this expression is bounded by the same range asevent_ts." -
Half-open avoids the time-truncation pitfall a second time — even if partition pruning worked,
BETWEEN '2026-05-01' AND '2026-05-07'would still drop 23h 59m of May 7. The half-open< '2026-05-08'is correct on both axes. - No function on the column is the universal rule — applies to B-tree (kills index), BRIN (kills the min/max summary match), and partition pruning (kills the partition match). Keep functions off the column.
- Same lesson, three engines — BigQuery's "partition pruning" and Snowflake's "automatic clustering" follow the identical rule: filter on the raw partition / cluster column, not a transformed expression.
-
Cost — pruning reduces I/O by
1 / num_partitions × scanned_partitionsand reduces latency proportionally. On a daily-partitioned table with 365 partitions and a 7-day filter, the speed-up is ~ 50×.
SQL
Topic — date functions (SQL)
Partitioned date-range performance problems (SQL)
Cheat sheet — interval recipes for production
Pick the recipe by the intent expressed in the requirement, not by the keyword the requirement happened to use. "All of January" is almost always a half-open monthly window, even if the spec says BETWEEN.
-
Last 7 days (rolling, including now).
created_at >= NOW() - INTERVAL '7 days' AND created_at < NOW()— half-open, anchored to wall-clock. -
This calendar month, dynamic.
created_at >= date_trunc('month', NOW()) AND created_at < date_trunc('month', NOW() + INTERVAL '1 month'). -
Specific calendar month (Jan 2026).
created_at >= '2026-01-01' AND created_at < '2026-02-01'. Same shape, fixed endpoints. -
Year-to-date.
created_at >= date_trunc('year', NOW()) AND created_at < NOW(). -
Inclusive numeric range (e.g. test scores 60–100).
score BETWEEN 60 AND 100— the closed-closed shape is what the human spec usually wants for grading thresholds. -
Half-open numeric range (e.g. price tier
[100, 200)).price >= 100 AND price < 200— for back-to-back tiers ([100,200),[200,300), …) without overlap. -
Tolerance-aware FLOAT compare.
value >= target - 1e-9 AND value <= target + 1e-9— when the column is locked to FLOAT and you need approximate equality. -
Interval overlap test (booking conflict).
(a_start, a_end) OVERLAPS (b_start, b_end)(Postgres / Oracle) or the portablea_start < b_end AND b_start < a_end. -
First-class interval column (Postgres).
tstzrange('2026-01-01', '2026-02-01', '[)')— store the closed-open marker in the value itself; query with@>,&&,-|-operators backed by a GiST index. -
Last-N-minutes for streaming dashboards.
event_ts >= NOW() - INTERVAL '5 minutes' AND event_ts < NOW(). Paired with BRIN onevent_ts+ daily partitions, sub-second on TB-scale. -
Numeric BETWEEN on a calculated column.
WHERE price BETWEEN cost * 0.9 AND cost * 1.1— both sides per-row, still index-friendly onprice. -
String BETWEEN.
name BETWEEN 'A' AND 'M'— lexicographic, case-sensitive; for case-insensitive useLOWER(name) BETWEEN 'a' AND 'm'with a functional index onLOWER(name).
A small heuristic for the interview-round version of the question: when the interviewer says "between dates X and Y," ask "inclusive of Y, or up-to-but-not-including Y?" In 90% of real production requirements the answer is up-to-but-not-including — and the half-open form is the right one.
Frequently asked questions
Is SQL BETWEEN inclusive or exclusive?
BETWEEN is inclusive on both ends in every SQL dialect — x BETWEEN a AND b is exactly x >= a AND x <= b. This is fixed by the ANSI SQL standard; Postgres, MySQL, SQL Server, Oracle, BigQuery, Snowflake, and SQLite all agree. If you need to exclude one of the endpoints, rewrite to half-open: x >= a AND x < b (closed-open, the production default), x > a AND x <= b (open-closed), or x > a AND x < b (open-open). The optimiser compiles all four forms to the same B-tree range scan, so the choice is purely about correctness.
Why do my date BETWEEN queries miss records on the last day?
Because BETWEEN '2026-01-01' AND '2026-01-31' over a TIMESTAMP column is column >= '2026-01-01 00:00:00' AND column <= '2026-01-31 00:00:00' — the right-hand date literal is implicitly cast to midnight on Jan 31, so every row from 00:00:01 through 23:59:59.999999 of that day is silently excluded. This is the #1 production query bug in the entire range-query family. The fix is either to cast the column to DATE (slow on large tables) or — preferred — use a half-open interval: column >= '2026-01-01' AND column < '2026-02-01'. The half-open form is correct regardless of whether the column is DATE, TIMESTAMP, or TIMESTAMPTZ.
Is BETWEEN slower than >= AND <=?
No — they are bit-for-bit identical at the plan level. Every modern SQL optimiser unfolds x BETWEEN a AND b to x >= a AND x <= b before planning, so the choice is a readability and convention decision, not a performance one. The performance question that does matter is whether a function or CAST has been applied to the column — WHERE created_at::date BETWEEN ... disables B-tree index usage even though BETWEEN itself is free. Keep functions off the column, then BETWEEN and >= AND <= are interchangeable.
Can BETWEEN use an index?
Yes — WHERE indexed_col BETWEEN a AND b is a textbook B-tree range scan with cost O(log N + K), where N is the table size and K is the matching row count. For very large, time-ordered, append-only tables (think 100M+ rows), evaluate a BRIN (block-range index) on the timestamp column — it's often 1000× smaller than the B-tree and faster on "recent window" scans because of sequential I/O. For declaratively partitioned tables (Postgres native partitioning, BigQuery, Snowflake), a BETWEEN on the partition key triggers partition pruning — the planner skips every partition that doesn't overlap the predicate. All three strategies require the WHERE clause to reference the raw column, not a transformed expression.
Does BETWEEN work with TIMESTAMP and TIMESTAMPTZ?
Yes, but with two caveats. First, BETWEEN over a TIMESTAMP column is inclusive on both ends, so the same time-truncation pitfall applies: BETWEEN '2026-01-01' AND '2026-01-31' will silently drop 23h 59m of Jan 31. Use a half-open interval (>= '2026-01-01' AND < '2026-02-01') instead. Second, with TIMESTAMPTZ the right-hand date literal is interpreted in the session's time zone before being converted to UTC for storage and comparison — so a '2026-01-31' in a session set to America/New_York becomes 2026-01-31 05:00:00 UTC, which can shift the boundary by a full calendar day. When timezones matter, write the boundaries as fully-qualified TIMESTAMP WITH TIME ZONE literals ('2026-01-01 00:00:00-05:00') or use date_trunc('day', col AT TIME ZONE 'America/New_York') for an explicit local-day window.
Does BETWEEN match NULL values?
No. SQL is three-valued — NULL is neither true nor false; comparing NULL against anything yields NULL (unknown). So NULL BETWEEN 1 AND 10 evaluates to NULL, which fails the WHERE clause (only rows that evaluate to true are returned). If you want to include rows where the column is NULL, write it explicitly: WHERE (x BETWEEN 1 AND 10) OR x IS NULL. The same rule applies to NOT BETWEEN — NULL NOT BETWEEN 1 AND 10 is also NULL, so neither BETWEEN nor NOT BETWEEN will pick up NULL rows on its own. This is a frequent interview gotcha: the candidate writes WHERE amount NOT BETWEEN 100 AND 200 to filter out a range and is surprised that NULL amounts also disappear.
Practice on PipeCode
- Drill the SQL practice library → to lock in BETWEEN, half-open ranges, and WHERE-clause patterns across every difficulty tier.
- Rehearse filtering problems (SQL) → for the day-to-day range-query mechanics interviewers always probe.
- Stress-test edge cases with date functions (SQL) → and date arithmetic (SQL) → — every pitfall in §3 has a problem set.
- Layer in aggregation problems (SQL) → for the monthly-bucket and rolling-window queries that depend on correct interval semantics.
- For the broader interview surface, read top data engineering interview questions →.
- Calibrate your numeric column choices with SQL data types in PostgreSQL → , the prerequisite for understanding why NUMERIC beats FLOAT for money.
- Stack the prerequisites with the only 5 skills you need to become a data engineer →.
- Reinforce the SQL fundamentals with SQL for data engineering interviews — from zero to FAANG →.
- For the design-round muscles around partitioning and pruning, work through ETL system design for DE interviews →.
- To pair range queries with table modelling, browse data modelling for DE interviews →.
Pipecode.ai is Leetcode for Data Engineering — every BETWEEN, half-open, and partition-pruning concept above ships with hands-on practice rooms where you run real SQL against real data sets and see exactly which rows your query missed. Start with the SQL filtering library and work outward; PipeCode pairs every reading with 450+ DE-focused problems and a real-time scoring engine.
![PipeCode blog header for a SQL BETWEEN + range queries deep dive — bold white headline 'SQL BETWEEN · Range Queries' with subtitle 'Numeric · Date · Inclusive vs Exclusive · Half-Open' and a stylised [a, b] inclusive bracket visual on a dark gradient with purple, green, orange, and blue accents and a small pipecode.ai attribution.](https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F552mhixxeedxf2nijybo.jpeg)



Top comments (0)