DEV Community

Anshu Aditya
Anshu Aditya

Posted on

I Built a Free PostgreSQL Internals Course — Here's What I Learned

Ever wondered what actually happens when you run SELECT * FROM users?

I did too. So I spent weeks diving into PostgreSQL's source code, documentation, and internals. Instead of keeping these learnings to myself, I documented everything into a comprehensive course.

The result: 33 detailed lessons covering PostgreSQL from SQL parsing to disk I/O.

🐘 Why PostgreSQL Internals?

Most developers treat databases as black boxes. You write SQL, and either it's fast or it's slow. If it's slow, you add an index and hope for the best.

But understanding how PostgreSQL works changes everything:

  • You'll know why your query is slow, not just that it's slow
  • You'll make informed decisions about indexes, schema design, and configuration
  • You'll debug production issues with confidence
  • You'll read EXPLAIN output like a pro

📚 What's Inside

The course covers the complete PostgreSQL architecture:

Query Processing Pipeline

  • Parser → Transforms SQL text into an AST
  • Analyzer → Semantic analysis and name resolution
  • Rewriter → View expansion and rule application
  • Planner → Cost-based query optimization
  • Executor → Actually runs the query plan

Storage Engine

  • Heap Storage — How tables are stored on disk
  • Page Layout — The 8KB building blocks
  • TOAST — How PostgreSQL handles large values
  • HOT Updates — Optimizing updates without index changes

Concurrency & Transactions

  • MVCC — How reads don't block writes
  • Transaction Isolation — Read Committed vs Serializable
  • Locking — Table locks, row locks, advisory locks
  • VACUUM — Why dead tuples accumulate and how to clean them

Performance

  • B-tree, GIN, GiST, BRIN Indexes — When to use each
  • Join Algorithms — Nested Loop, Hash, Merge
  • Parallel Query — Using multiple cores
  • Cost Model — How the planner estimates query costs

Operations

  • Streaming & Logical Replication — High availability
  • Partitioning — Managing large tables
  • Connection Pooling — Scaling connections with PgBouncer
  • Background Processes — Autovacuum, checkpointer, WAL writer

🎯 Sample: How MVCC Works

Here's a taste of what's in the course. Every row in PostgreSQL has hidden system columns:

-- You can actually see these!
SELECT xmin, xmax, ctid, * FROM users LIMIT 1;
Enter fullscreen mode Exit fullscreen mode
  • xmin: Transaction ID that created this row
  • xmax: Transaction ID that deleted/updated this row (0 if still live)
  • ctid: Physical location (page, offset)

When you UPDATE a row, PostgreSQL doesn't modify it in place. It creates a NEW version:

Old row: xmin=100, xmax=101, data="Alice"   ← Now dead
New row: xmin=101, xmax=0,   data="Alicia"  ← Current version
Enter fullscreen mode Exit fullscreen mode

This is MVCC — old versions are kept for concurrent readers. VACUUM cleans them up later.

💡 Key Insights

After going through the internals, here are my biggest takeaways:

1. Everything is a page
PostgreSQL stores everything in 8KB pages — heap data, indexes, even system catalogs.

2. Updates are actually INSERT + DELETE
MVCC means updates create new row versions. That's why VACUUM exists.

3. The planner is cost-based
It estimates I/O and CPU costs based on statistics. Wrong statistics = wrong plans.

4. Index scans aren't always better
For queries returning >15% of rows, sequential scans often win.

5. HOT updates are magic
Updates to non-indexed columns skip index maintenance entirely.

🔧 Each Lesson Includes

  • Conceptual explanations with ASCII diagrams
  • Real examples you can run
  • Source code references to PostgreSQL internals
  • Hands-on exercises
  • Key takeaways

📊 Course Stats

Metric Count
Lessons 33
Modules 14
Lines of content 22,000+
Hands-on exercises 150+
Cost FREE

🔗 Get the Course

The entire course is open source on GitHub:

GitHub logo Iamanshuaditya / postgresql-deep-dive

Master PostgreSQL Internals: A comprehensive deep-dive from SQL parsing to disk I/O. Covers MVCC, WAL, VACUUM, Indexing, and Query Execution.

PostgreSQL Internals Course

A comprehensive deep-dive into PostgreSQL's internal architecture, from SQL parsing to disk I/O.

🎯 What You'll Learn

This course takes you through PostgreSQL's source code and internal mechanisms:

  • Query Processing Pipeline: Parser → Analyzer → Rewriter → Planner → Executor
  • Storage Engine: Heap storage, page layout, TOAST, tablespaces
  • Index Internals: B-tree, GIN, GiST, BRIN, Hash indexes
  • Memory Management: Buffer pool, shared memory, work_mem
  • MVCC & Concurrency: Transaction isolation, visibility, locking
  • WAL & Durability: Write-ahead logging, checkpoints, recovery
  • VACUUM & Maintenance: Dead tuple cleanup, XID wraparound prevention
  • Query Optimization: Join algorithms, parallel execution, cost model
  • Replication: Streaming and logical replication
  • Advanced Topics: Partitioning, extensions, connection pooling

📚 Course Structure

Module 1: Foundation — From SQL to Execution





Clone it, read it offline, contribute improvements, or just give it a ⭐!


Have questions about PostgreSQL internals? Drop them in the comments!

What topic would you like me to cover next?

Top comments (0)