Show HN: I replaced database deadlock victim selection with an LLM — here's the data

#ai #database #llm #opensource

Every mainstream database uses fixed rules for deadlock victim selection.

MySQL kills the one with the fewest locks. CockroachDB kills the youngest. PostgreSQL kills the one that closed the cycle. Not one of them can explain "why you" — the killed transaction just knows it got rolled back, no reason given.

I wanted to see what happens if an LLM makes that decision instead.

What I Built

DDA is a Sidecar process. It sits outside the database, polls lock state, builds a wait-for graph, runs DFS cycle detection, and — only at the "choose a victim" step — calls an LLM (deepseek-v4-pro via Anthropic SDK). The other four steps (SQL execution, lock parsing, graph construction, rollback) are pure deterministic Python code.

The database is rookieDB, Berkeley's CS186 teaching database. It has full multi-granularity locking (IS/IX/S/SIX/X) but exactly zero deadlock detection — the minimal experimental environment DDA needs. I chose it not because I couldn't touch a real database, but because its codebase (~tens of thousands of lines of Java) was readable enough to fully understand the locking system in my spare time. The Sidecar architecture doesn't bind to any database — adapting to PostgreSQL means swapping one data source. No database code changes needed.

Honest limitations upfront:

LLM latency is 2-4 seconds per call. Fixed rules are microseconds. This is a real tradeoff.
rookieDB is a teaching system, not a production database.
Only 3 deadlock scenarios. Not comprehensive.
In production, deadlock prevention (wound-wait) matters more than detection + rollback. Victim selection is a relatively niche problem.

But it's a real, finished project with some findings I think are interesting.

The Experiment

Three fixed-rule baselines first:

Strategy	Logic	Database Analogy
MinLocks	Fewest locks → kill	MySQL
YoungestFirst	Most recently started → kill	CockroachDB
CycleTrigger	Deepest in wait queue → kill	PostgreSQL/Oracle

Then the LLM. 3 scenarios × 4 strategies = 12 runs.

12/12 deadlocks detected and resolved. Zero residual locks. Zero hallucinated victims. Zero fallbacks triggered.

The most telling data — cascading_deadlock scenario. T1 holds 12 locks (2 exclusive X write locks), T2 holds 8 locks (1 exclusive X write lock):

MinLocks:      "T10 holds 8 locks — fewest"
YoungestFirst: "T22 started at ... — youngest"  
CycleTrigger:  "T33 is last in request queue"
LLM:           "T124 holds fewer locks (8 vs 12) and only one X write lock
                vs T123's two — lower rollback cost"

The LLM compared not just lock counts, but lock type severity. It learned the hierarchy (X > SIX > S > IX > IS) from the system prompt and applied it to this specific scenario. Fixed rules can't do multi-dimensional reasoning.

The symmetric case was even more revealing. Two transactions: 8 locks each, identical start times, identical lock types. All three fixed rules degraded to tiebreaking — effectively random — but each still claimed it "followed the rule." The LLM was more honest:

"Both transactions are symmetric: 8 locks each, including X and SIX, identical start times. Choosing T32 arbitrarily to break the deadlock."

Fixed rules offer false certainty. The LLM admits when nothing differentiates the candidates. I prefer the latter.

Hallucination Defense

LLMs hallucinate, so I built three layers of defense:

Hard reject: victim must be in the deadlock cycle — code validates this. If not → fallback to MinLocks
Soft warn: _verify_claims() regex-extracts specific numbers from the LLM's reason (lock counts, timestamps), cross-checks against actual snapshot data. Mismatches get a [⚠] marker but don't reject the choice
Prevention: the prompt includes a code-computed structured summary as ground truth, plus a direct instruction not to fabricate resource names or business meanings

If the LLM completely fails → 0ms fallback to MinLocks. System availability is unaffected.

Bugs Worth Mentioning

DeepSeek thinking mode: v4-pro enables thinking by default. Thinking tokens count toward max_tokens. My initial max_tokens=1024 was entirely consumed by thinking (~3500 tokens), leaving an empty text response. Set to 8192 to fix.
Content block structure: thinking returns type='thinking', text=None as the first content block. Calling response.content[0].text — the standard SDK approach — returns None. The actual JSON is in a later type='text' block. Wrote _extract_text() to iterate all blocks.
httpx compatibility: anthropic 0.39 is incompatible with httpx 0.28+ (removed proxies parameter). Pinned httpx<0.28.

Tech Stack

Python 3 + asyncio + Anthropic SDK. rookieDB (Java) as the database backend.

No LangChain, CrewAI, or any agent framework — the LLM appears at exactly one step.

37 unit tests + 7 integration tests (live LLM calls). Full design docs and experiment reports in both English and Chinese. 6 architecture decision records (why I cut Multi-Agent, why Sidecar, why Rich-Context prompting, etc.).

GitHub: https://github.com/ShunheWang/dda

What's Next

PostgreSQL adaptation — swap \alllocks for pg_locks + pg_stat_activity. Every other component (WFG, cycle detection, victim selection, rollback) fully reuses. Then a direct comparison: "PG's built-in detection chose T42, DDA's LLM chose T43 because..." — that's the data I actually want.

If I can get transaction SQL text and affected row counts from the LockManager, the LLM could do even deeper analysis — "T1 updated the payments table but only 1 row affected vs T2 updating 5000 rows in orders."

I'm currently looking for backend/infrastructure/database roles. Location flexible — based in Wuhan, open to Shanghai, Beijing, remote. 5 years backend experience, details in my GitHub profile and resume. If your team has something relevant, I'd love to talk.