Yuva Kunaal

Posted on May 18

"Wrong Answer" is the worst feedback you can give a SQL learner — so I built something better

#sql #python #beginners #backend

You submit a query. It's wrong.

The platform says: Incorrect result.

That's it.

No diff. No hint. No explanation of which rows are wrong or why. Just a red banner telling you to try again.

I've used enough SQL learning tools to know this is the standard. And I think it's the single biggest reason people get stuck and give up — not because SQL is hard, but because the feedback loop is broken.

So when I built SQLumina, I made the debug experience the core feature. Here's exactly how it works under the hood.

The problem with "wrong answer"

When a SQL query returns incorrect results, there are maybe 15 different reasons why:

Wrong JOIN type (INNER when you needed LEFT)
Joining on the wrong columns (PK to PK instead of FK to PK)
Missing GROUP BY column
WHERE clause firing before aggregation
HAVING threshold off by one
LIKE missing a wildcard
NULL comparison written as col = NULL instead of IS NULL

Each of these produces a different shape of wrong output. A generic "incorrect" banner treats them all the same. That's useless.

What a learner actually needs is: which rows are wrong, and why.

Step 1 — Building the row-level diff

Every challenge in SQLumina has a reference solution stored server-side. When a user submits, the backend runs both queries concurrently against the live PostgreSQL database:

user_result, ref_result = await asyncio.gather(
    run_query(user_sql, schema),
    run_query(reference_sql, schema)
)

Then it computes three sets:

correct = user_rows & reference_rows
missing = reference_rows - user_rows
extra   = user_rows - reference_rows

These map directly to the three columns in the debug panel: Correct, Missing, Extra.

The diff table shows full row values — all columns, no truncation. If a row is missing because your JOIN dropped it, you can see exactly which row it was.

Step 2 — Accuracy scoring with Jaccard similarity

Once you have the three sets, you need a single accuracy number.

I use Jaccard similarity:
accuracy = |correct| / (|correct| + |missing| + |extra|)

The reason this matters: a simple "percentage correct" would reward over-fetching. If the expected result is 10 rows and you return 10,000, you'd technically have 100% of the correct rows — you'd just also have 9,990 extra ones.

Jaccard penalises both under-fetching and over-fetching. Getting 15 rows when 10 were expected is scored lower than getting exactly 10 right. This matches how production SQL correctness actually works.

Step 3 — Parsing the query AST

Showing the diff is useful. Explaining why the diff exists is what makes it genuinely educational.

This is where it gets interesting.

I built a 30+ rule heuristic engine that parses the user's actual SQL to extract:

Table names and aliases
JOIN conditions (specifically the ON clause columns)
SELECT columns and whether they're wrapped in aggregates
WHERE predicates and their operators
GROUP BY columns
HAVING conditions

Then it matches those parsed elements against the diff to produce specific diagnosis cards.

Example rule — wrong JOIN key detection:

def check_join_key(parsed, diff):
    for join in parsed.joins:
        left_col  = join.condition.left.column
        right_col = join.condition.right.column
        left_ref  = get_table_pk(join.condition.left.table)
        right_ref = get_table_pk(join.condition.right.table)

        if left_col == left_ref and right_col == right_ref:
            return DiagnosisCard(
                severity="high",
                message=f"You're joining {join.condition.left.table}.{left_col} "
                        f"to {join.condition.right.table}.{right_col}. "
                        f"Both are primary keys. You probably want the foreign key instead.",
                fix=generate_corrected_join(join)
            )

The card surfaces in the UI with a severity badge, confidence percentage, the specific columns involved, a corrected SQL fragment, and a plain-language explanation of the underlying concept.

Step 4 — The zero-rows suite

A separate set of 7 rules fires specifically when the user's query returns zero rows against a non-empty expected result. These are the most frustrating bugs to debug because the diff table is just... empty.

The rules detect:

Pattern	Issue
`col = NULL`	Should be `IS NULL`
`LIKE 'foo'`	Missing wildcard — should be `LIKE '%foo%'`
`BETWEEN hi AND lo`	Bounds are reversed
`NOT IN (...)` with NULL in list	NULL silently excludes every row
`date_col > CURRENT_DATE + INTERVAL '30 days'`	Future filter — should subtract
WHERE clause over-filtered	Too many AND conditions, no rows survive
No WHERE when one is required	Missing filter entirely

Each zero-row rule names the exact predicate from the user's query and suggests the corrected version. No generic messaging.

Step 5 — The pgBouncer bug that took me days

This one isn't about the debugger directly. It's about the infrastructure that makes sandboxed query execution reliable — and it's the bug that cost me the most time.

SQLumina runs on Neon's serverless PostgreSQL with asyncpg. Each user query runs inside a schema — ecommerce, social_media, or banking — and sandbox queries run in an isolated sandbox_{user_id} schema.

To route queries to the right schema, I was setting the search path before executing:

await conn.execute(f"SET search_path TO {schema}")
await conn.execute(user_sql)

In development: worked perfectly.

In production under any real load: randomly failing with relation "products" does not exist.

Not consistently. Not reproducibly. Just sometimes, on concurrent requests.

The root cause: Neon uses pgBouncer in transaction mode.

In transaction mode, pgBouncer doesn't guarantee that two consecutive statements from the same application connection hit the same backend PostgreSQL connection. The SET search_path runs on connection A. The actual query lands on connection B — which has a fresh search path pointing at public, where your tables don't exist.

The fix:

async with conn.transaction():
    await conn.execute(f"SET LOCAL search_path TO {schema}")
    await conn.execute(f"SET LOCAL statement_timeout = '5000'")
    result = await conn.fetch(user_sql)

SET LOCAL scopes the change to the current transaction. Since the transaction is atomic, both statements are guaranteed to execute on the same backend connection. pgBouncer can't split them.

Two lines. Took days to find. Now it never fails.

If you're building on Neon and hitting "relation does not exist" errors randomly under load — this is your fix.

Step 6 — Sandboxing safely

A few things that keep user queries from doing damage:

Comment stripping — SQL comments (-- and /* */) are stripped before parsing so users can't hide keywords inside comments to bypass validation.

Destructive operation blocking — DROP DATABASE, TRUNCATE outside sandbox scope, and GRANT are blocked at the parser level before execution.

Auto-LIMIT — LIMIT 1000 is appended when absent so a SELECT * on a large table doesn't flood the frontend.

Timeout — SET LOCAL statement_timeout = '5000' kills queries at the DB level after 5 seconds. A Python asyncio.wait_for at 6 seconds is a belt-and-suspenders fallback — PostgreSQL cancels cleanly before Python times out, which prevents dirty pool connections.

Read-only enforcement — The main ecommerce, social_media, and banking schemas are strictly read-only from the Playground. Only the user's own sandbox schema allows DML.

Per-user isolation — Every user gets their own sandbox_{user_id} schema. Other users' tables are completely invisible. Schema names are case-folded and sanitised before interpolation so two Clerk IDs that differ only in case can never collide.

What I learned

Building this taught me more SQL than taking any course did.

You can't write a rule that detects "you're joining on two primary keys" without understanding deeply why that's wrong and what the correct foreign key relationship should look like. You can't write the zero-rows suite without knowing every way a query can silently exclude all its data.

The debug engine is now 30+ rules. It's also the thing I'm most proud of in the entire codebase.

If you're building anything where users write queries against real data — especially in a learning context — invest in the feedback layer before anything else. The SQL execution part is easy. Telling someone specifically what they got wrong is the hard problem.

SQLumina is about to launch...

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.