Arpit Godghate

Posted on Apr 10

I Built a 20-Hour DBMS Interview Prep System Using LLMs — Does It Actually Work?

#interview #database #backend #ai

I used LLMs to build a complete DBMS interview prep system. Here's exactly how - and I want your honest feedback.

Instead of randomly Googling "DBMS interview questions", I ran an experiment using LLMs as my study partner. The results surprised me.

Step 1: Curate the right questions
I asked the LLM: "What are the most frequently asked DBMS questions in senior backend interviews?", not once, but iteratively. I cross-referenced across difficulty levels, topics, and interview formats until I had a distilled list of 100 questions spanning 10 modules - from basics like normalisation all the way to replication, sharding, and MVCC.

Step 2: Find the minimal set of resources
Here's where it got interesting. I gave the LLM my 100 questions and asked: "What is the smallest set of resources that covers all of these with zero overlap?" It mapped every question to exactly 3 resources:

InterviewBit DBMS/SQL articles (free) - for fundamentals + SQL practice
DDIA by Martin Kleppmann (only 5 specific chapters) - for the deep "why" behind transactions, indexing, replication, and storage
LeetCode Top SQL 50 - for hands-on query practice

Total estimated time: ~20 hours. No fluff, no 40-hour courses.

Step 3: Interview-style learning
This was the game-changer. After studying, I asked the LLM to act as an interviewer:

It asked me each question one by one
I answered as if I were in a real interview
It evaluated my response - not just for correctness, but for what a recruiter at a senior level would actually want to hear (trade-offs, real-world examples, depth vs. rambling)

This feedback loop forced me to articulate answers clearly instead of just "knowing" the concept in my head.

Why I'm sharing this:
I genuinely don't know if this approach is better or worse than traditional prep. It felt efficient, but I want to pressure-test it with people who've been on the other side of the table.

A few specific questions for you:

If you've interviewed candidates, does this kind of structured prep actually show in interviews, or does it come across as rehearsed?
Are there blind spots in using an LLM as both curriculum designer and mock interviewer?
What would you add or change to this method?

I'll share the full 100-question study guide with the 7-day plan in the comments if anyone wants it.

Would love to hear what's worked (or not worked) for you.

Top comments (2)

Arshi Sabah • Apr 10

Please share the questions

Arpit Godghate • Apr 11

here you go! let me know what you think about the study material that I curated.
Also, one nuance I noticed is that back and forth conversation per question consumes a lot of credits
So, I would suggeste answering 5 at once.

Module 1: DBMS Fundamentals (Q1–Q10)

What is a DBMS? How does it differ from a file system?
What is RDBMS? Name examples and explain how it differs from DBMS.
What are the different types of DBMS (Hierarchical, Network, Relational, Object-Oriented)?
Explain the 3-tier architecture of a DBMS (Physical, Logical, View).
What are the different types of database languages (DDL, DML, DCL, TCL)?
What is a schema vs an instance in DBMS?
What is data abstraction and what are its three levels?
What is the Entity-Relationship (ER) model? Draw and explain an ER diagram.
How does an ER diagram translate to a relational schema?
What is the difference between a 2-tier and 3-tier database architecture?

Module 2: Relational Model & Keys (Q11–Q20)

What is a relation/table? Explain rows, columns, tuples, attributes.
What is a primary key and why is it important?
What is a foreign key? Explain referential integrity.
What is a candidate key vs a super key?
What is a composite key? When do you use one?
What is the difference between a primary key and a unique key?
What are the different types of relationships (1:1, 1:N, M:N)?
What are database constraints (NOT NULL, UNIQUE, CHECK, DEFAULT, FOREIGN KEY)?
What is referential integrity and cascading actions (ON DELETE CASCADE)?
What is a surrogate key vs a natural key? Pros and cons.

Module 3: Normalization & Schema Design (Q21–Q30)

What is normalization? Why is it important?
Explain 1NF, 2NF, 3NF with examples.
What is BCNF? How does it differ from 3NF?
What are 4NF and 5NF? (Multi-valued and join dependencies)
What is denormalization? When and why would you denormalize?
What is data redundancy and how does normalization reduce it?
What are functional dependencies, partial dependencies, and transitive dependencies?
How do you design a database schema for an e-commerce application? (practical)
What are the trade-offs between normalized and denormalized schemas in production?
How do you handle schema evolution/migrations with minimal downtime?

Module 4: SQL Querying (Q31–Q45)

What is the difference between WHERE and HAVING?
Explain INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN, CROSS JOIN with examples.
What causes duplicate rows after a JOIN and how do you fix it?
What is a correlated subquery? When is it slower than a JOIN?
UNION vs UNION ALL — performance and use cases.
How do you write a Top-N per group query? (window functions)
What is a CTE (Common Table Expression)? How does it differ from a subquery?
What is a VIEW? When should you avoid using views?
What is a materialized view and what are its trade-offs?
Explain aggregate functions (COUNT, SUM, AVG, MIN, MAX) and GROUP BY.
What is the difference between DELETE, TRUNCATE, and DROP?
What is a prepared/parameterized statement? Why use it?
What is the N+1 query problem and how do you solve it?
How do window functions work (ROW_NUMBER, RANK, DENSE_RANK, LEAD, LAG)?
How do you fetch duplicate records from a table?

Module 5: Indexing & Query Optimization (Q46–Q58)

What is an index? How does it work internally (B-tree, B+ tree)?
What is the difference between a clustered and non-clustered index?
What is a composite index? Explain the leftmost prefix rule.
What is a covering index? How does it eliminate table lookups?
What is selectivity/cardinality and why does it matter for indexing?
When does a query NOT use an index even if one exists?
Index scan vs index seek — when do they occur?
What is index fragmentation/bloat and how do you fix it?
What is an execution plan (EXPLAIN/EXPLAIN ANALYZE)? What do you check first?
Why can too many indexes slow down INSERT/UPDATE/DELETE?
What is a hash index? When would you use it over B-tree?
What is a full-text index and when is it appropriate?
Common production reasons for slow queries and debugging flow.

Module 6: Transactions, Isolation & Concurrency (Q59–Q70)

What is a transaction? Explain ACID properties in depth.
What does autocommit mean? When should you disable it?
Explain the four isolation levels (Read Uncommitted, Read Committed, Repeatable Read, Serializable).
What is a dirty read, non-repeatable read, and phantom read?
What is a lost update and how do you prevent it?
Optimistic locking vs pessimistic locking — when to use each.
What is MVCC (Multi-Version Concurrency Control)? How does Postgres/MySQL implement it?
What is a deadlock? How do databases detect and resolve deadlocks?
What is lock escalation and why can it slow systems down?
What is write skew? Why is it tricky even under Repeatable Read?
What is COMMIT, ROLLBACK, and SAVEPOINT?
When should you use a read-only transaction or a snapshot?

Module 7: Storage, Logging & Recovery (Q71–Q80)

What is Write-Ahead Logging (WAL) and why is it critical for durability?
What is the difference between redo log and undo log?
What are checkpoints and how do they reduce crash recovery time?
What happens during crash recovery? (high-level steps)
What is Point-in-Time Recovery (PITR)?
Full vs incremental backups — trade-offs and when to use each.
Logical backup vs physical backup — differences and use cases.
What is the difference between replication and backup?
What are RPO and RTO? Why do interviewers ask about them?
How do database storage engines work (InnoDB vs MyISAM, LSM-tree vs B-tree)?

Module 8: Replication, Sharding & Distributed DB (Q81–Q90)

What is replication? Why do teams add read replicas?
Leader-follower vs multi-leader replication — trade-offs.
What is replication lag and how should applications handle it?
What is partitioning/sharding? When should you shard?
Hash-based vs range-based sharding — trade-offs.
What is a hot shard/partition? How do you mitigate it?
What is eventual consistency? When is it acceptable?
Strong reads vs stale reads — when to use each.
Why are distributed transactions hard? How do teams avoid them (Saga pattern, etc.)?
How do you design a multi-region, highly available database setup?

Module 9: SQL vs NoSQL & CAP Theorem (Q91–Q96)

SQL vs NoSQL — when to use which?
What is the CAP theorem? Explain with real-world examples.
What is BASE (Basically Available, Soft state, Eventually consistent)?
When would you pick MongoDB / DynamoDB / Cassandra / Redis over PostgreSQL?
What is a document database vs a column-family store vs a key-value store vs a graph database?
How do you model data differently in NoSQL vs SQL?

Module 10: Security, Governance & Practical Production (Q97–Q100)

What is SQL injection? How do parameterized queries prevent it?
What is row-level security? Explain a real use case.
DB authentication vs authorization — how do roles work in practice?
What are stored procedures and triggers? When should and shouldn't you use them?

ONLY 3 RESOURCES (Zero Overlap)

Each resource has a unique job. No two cover the same ground.

Resource 1: InterviewBit DBMS Questions (FREE — Web Article)

interviewbit.com/dbms-interview-qu...

Covers: Q1–Q30 (fundamentals, keys, normalization, ER model) + Q97–Q100 (security)

Time: ~3 hours to read through

Why this one: Fastest way to nail the theory questions. Direct Q&A format, no fluff. Skip everything DDIA/courses would be overkill for.

Resource 2: DDIA — Designing Data-Intensive Applications (Book — only 5 chapters)

By Martin Kleppmann. Read ONLY these chapters:

Ch 2 (Data Models & Query Languages) → Q8–Q9, Q17, Q28–Q29, Q91–Q96

Ch 3 (Storage & Retrieval) → Q46–Q58, Q71–Q80

Ch 5 (Replication) → Q81–Q83, Q87–Q88

Ch 6 (Partitioning) → Q84–Q86, Q89–Q90

Ch 7 (Transactions) → Q59–Q70 ⭐ MOST IMPORTANT CHAPTER

Time: ~12–15 hours for these 5 chapters

Why this one: The only resource that explains the WHY behind transactions, isolation, replication, sharding, and storage engines at interview depth. Nothing else comes close for Q46–Q96.

Resource 3: InterviewBit SQL Questions + LeetCode SQL (FREE)

interviewbit.com/sql-interview-que...
leetcode.com/studyplan/top-sql-50/

Covers: Q31–Q45 (SQL querying, JOINs, window functions, CTEs, N+1)

Time: ~4–5 hours (read article + solve 20–30 problems)

Why this one: SQL querying is a hands-on skill. Reading theory isn't enough. The InterviewBit article teaches syntax/concepts, LeetCode drills them in.

That's it. Total time: ~20–23 hours.

7-DAY SPEED-RUN PLAN (~3 hrs/day)

Day	What to do	Questions Covered
1	Read InterviewBit DBMS article (basic + intermediate sections)	Q1–Q30
2	DDIA Ch 7 — Transactions (the single most important chapter)	Q59–Q70
3	DDIA Ch 3 — Storage & Retrieval (B-trees, LSM, WAL, indexes)	Q46–Q58, Q71–Q80
4	DDIA Ch 5 + Ch 6 — Replication & Partitioning	Q81–Q90
5	DDIA Ch 2 — Data Models + InterviewBit Security section	Q91–Q100
6	InterviewBit SQL article + solve 20 LeetCode SQL problems	Q31–Q45
7	Full revision: answer all 100 questions out loud, 2 min each	All

QUESTION → RESOURCE CHEAT SHEET

Questions	Read This	Time
Q1–Q30 (fundamentals, keys, normalization)	InterviewBit DBMS article	3 hrs
Q31–Q45 (SQL querying)	InterviewBit SQL article + LeetCode Top SQL 50	4–5 hrs
Q46–Q58 (indexing, query optimization)	DDIA Ch 3	3 hrs
Q59–Q70 (transactions, isolation, concurrency)	DDIA Ch 7 ⭐	3 hrs
Q71–Q80 (storage, WAL, recovery, backups)	DDIA Ch 3 (second half)	included above
Q81–Q90 (replication, sharding, distributed)	DDIA Ch 5 + Ch 6	3 hrs
Q91–Q96 (SQL vs NoSQL, CAP)	DDIA Ch 2	2 hrs
Q97–Q100 (security, triggers, procedures)	InterviewBit DBMS article (advanced section)	1 hr

KEY TIPS

DDIA Ch 7 is your highest-ROI read. Isolation levels, MVCC, lost updates, write skew — come up in almost every senior backend interview.
Don't just define — explain trade-offs. "Sharding helps scale writes BUT introduces cross-shard query complexity."
Practice SQL by solving, not reading. LeetCode Top SQL 50 is enough.
Connect DB knowledge to system design. "I'd use Postgres with read replicas for the read-heavy path, shard by user_id at 500M rows."