Skip to content

DEV Community

Anshu Aditya

Posted on Jan 28

I Built a Free PostgreSQL Internals Course — Here's What I Learned

#postgres #database #learning #opensource

Ever wondered what actually happens when you run SELECT * FROM users?

I did too. So I spent weeks diving into PostgreSQL's source code, documentation, and internals. Instead of keeping these learnings to myself, I documented everything into a comprehensive course.

The result: 33 detailed lessons covering PostgreSQL from SQL parsing to disk I/O.

🐘 Why PostgreSQL Internals?

Most developers treat databases as black boxes. You write SQL, and either it's fast or it's slow. If it's slow, you add an index and hope for the best.

But understanding how PostgreSQL works changes everything:

You'll know why your query is slow, not just that it's slow
You'll make informed decisions about indexes, schema design, and configuration
You'll debug production issues with confidence
You'll read EXPLAIN output like a pro

📚 What's Inside

The course covers the complete PostgreSQL architecture:

Query Processing Pipeline

Parser → Transforms SQL text into an AST
Analyzer → Semantic analysis and name resolution
Rewriter → View expansion and rule application
Planner → Cost-based query optimization
Executor → Actually runs the query plan

Storage Engine

Heap Storage — How tables are stored on disk
Page Layout — The 8KB building blocks
TOAST — How PostgreSQL handles large values
HOT Updates — Optimizing updates without index changes

Concurrency & Transactions

MVCC — How reads don't block writes
Transaction Isolation — Read Committed vs Serializable
Locking — Table locks, row locks, advisory locks
VACUUM — Why dead tuples accumulate and how to clean them

Performance

B-tree, GIN, GiST, BRIN Indexes — When to use each
Join Algorithms — Nested Loop, Hash, Merge
Parallel Query — Using multiple cores
Cost Model — How the planner estimates query costs

Operations

Streaming & Logical Replication — High availability
Partitioning — Managing large tables
Connection Pooling — Scaling connections with PgBouncer
Background Processes — Autovacuum, checkpointer, WAL writer

🎯 Sample: How MVCC Works

Here's a taste of what's in the course. Every row in PostgreSQL has hidden system columns:

-- You can actually see these!
SELECT xmin, xmax, ctid, * FROM users LIMIT 1;

xmin: Transaction ID that created this row
xmax: Transaction ID that deleted/updated this row (0 if still live)
ctid: Physical location (page, offset)

When you UPDATE a row, PostgreSQL doesn't modify it in place. It creates a NEW version:

Old row: xmin=100, xmax=101, data="Alice"   ← Now dead
New row: xmin=101, xmax=0,   data="Alicia"  ← Current version

This is MVCC — old versions are kept for concurrent readers. VACUUM cleans them up later.

💡 Key Insights

After going through the internals, here are my biggest takeaways:

1. Everything is a page
PostgreSQL stores everything in 8KB pages — heap data, indexes, even system catalogs.

2. Updates are actually INSERT + DELETE
MVCC means updates create new row versions. That's why VACUUM exists.

3. The planner is cost-based
It estimates I/O and CPU costs based on statistics. Wrong statistics = wrong plans.

4. Index scans aren't always better
For queries returning >15% of rows, sequential scans often win.

5. HOT updates are magic
Updates to non-indexed columns skip index maintenance entirely.

🔧 Each Lesson Includes

Conceptual explanations with ASCII diagrams
Real examples you can run
Source code references to PostgreSQL internals
Hands-on exercises
Key takeaways

📊 Course Stats

Metric	Count
Lessons	33
Modules	14
Lines of content	22,000+
Hands-on exercises	150+
Cost	FREE

🔗 Get the Course

The entire course is open source on GitHub:

Iamanshuaditya / postgresql-deep-dive

Master PostgreSQL Internals: A comprehensive deep-dive from SQL parsing to disk I/O. Covers MVCC, WAL, VACUUM, Indexing, and Query Execution.

PostgreSQL Internals Course

A comprehensive deep-dive into PostgreSQL's internal architecture, from SQL parsing to disk I/O.

🎯 What You'll Learn

This course takes you through PostgreSQL's source code and internal mechanisms:

Query Processing Pipeline: Parser → Analyzer → Rewriter → Planner → Executor
Storage Engine: Heap storage, page layout, TOAST, tablespaces
Index Internals: B-tree, GIN, GiST, BRIN, Hash indexes
Memory Management: Buffer pool, shared memory, work_mem
MVCC & Concurrency: Transaction isolation, visibility, locking
WAL & Durability: Write-ahead logging, checkpoints, recovery
VACUUM & Maintenance: Dead tuple cleanup, XID wraparound prevention
Query Optimization: Join algorithms, parallel execution, cost model
Replication: Streaming and logical replication
Advanced Topics: Partitioning, extensions, connection pooling

📚 Course Structure

Module 1: Foundation — From SQL to Execution

Lesson	Topic
1	How PostgreSQL Processes a Query
2	The Parser — Transforming SQL Into Internal Structures
3	The Analyzer —

…

Clone it, read it offline, contribute improvements, or just give it a ⭐!

Have questions about PostgreSQL internals? Drop them in the comments!

What topic would you like me to cover next?

Top comments (0)

Subscribe