Khairul Basar

Posted on Jan 20

How POSTGRES indexing is more efficient than MYSQL

#postgres #mysql #database #dbms

Problem setup (How Index Scan Internally works for both DBs)

Table in MySQL and PostgreSQL:

user(
  id   BIGINT PRIMARY KEY,
  name VARCHAR(100),
  age  INT
)

Index:

CREATE INDEX idx_user_name ON user(name);

Query:

SELECT age FROM user WHERE name = 'Rahim';

Assume:

Table has 10 million rows
Index is a B-Tree
name is not unique
age is not in the index

1️⃣ MySQL (InnoDB) — Step by step

🔑 Important InnoDB rule

Secondary index stores the PRIMARY KEY, not the row location

So idx_user_name looks like:

(name, primary_key_id)

Step-by-step execution

🧭 Step 1: Traverse secondary index (name)

MySQL searches the B-Tree for 'Rahim'.

Cost: O(log N)
Example:

B-Tree levels:
Root → Internal → Leaf

At the leaf, it finds:

('Rahim', id=73482)

⚠️ It does NOT have age yet

🧭 Step 2: Traverse PRIMARY KEY index (clustered index)

In InnoDB:

Primary key index IS the table

Rows are physically stored in PK order

Now MySQL uses id = 73482 to search the PK B-Tree.

Cost: O(log N)
At the PK leaf node, it finds the full row:

(id=73482, name='Rahim', age=29)

✅ Total cost (MySQL)

🔍 Key MySQL characteristics

✅ Clustered index = good locality
❌ Two tree traversals
❌ PK lookup cost grows with table size

2️⃣ PostgreSQL — Step by step

🔑 Important PostgreSQL rule

Index stores a TID (tuple ID) → pointer to heap location

Index entry looks like:

(name, (block_id, offset))

Step-by-step execution

🧭 Step 1: Traverse B-Tree index (name)

PostgreSQL searches index for 'Rahim'.

Cost: O(log N)

At the leaf, it finds:

('Rahim', TID=(block=102345, offset=7))

🧭 Step 2: Heap fetch (direct pointer)

Postgres now:

Goes directly to heap page 102345

Reads row at offset 7

Cost: O(1) (conceptually)

Row found:

(id=73482, name='Rahim', age=29)

✅ Total cost (PostgreSQL)

🔍 Key PostgreSQL characteristics

✅ One B-Tree traversal
✅ Heap fetch is constant-time
❌ Heap pages may be scattered (less locality)
❌ Extra visibility checks (MVCC)

3️⃣ Side-by-side comparison (core difference)

4️⃣ Real-world performance (THIS is the key insight)

⚠️ Big-O is not the full story

When MySQL can be faster

PK is small
Data fits in buffer pool
Rows are accessed sequentially
Heavy read workloads

➡️ Clustered index wins

When PostgreSQL can be faster

Very large tables
Many secondary indexes
Random access patterns
Index-only scans possible

➡️ Pointer-based access wins

5️⃣ Index-only scan (Postgres secret weapon 🧠)

If you change index to:

CREATE INDEX idx_user_name_age ON user(name, age);

Then:

SELECT age FROM user WHERE name = 'Rahim';

🔥 PostgreSQL can return result WITHOUT touching heap at all

Cost: O(log N)

MySQL cannot do true index-only scan in the same way because:
Secondary index does not contain row version visibility info

6️⃣ Final verdict (short answer)

📌 Theoretical efficiency

✅ PostgreSQL is more efficient

`O(logN) + O(1)

O(logN) + O(logN)`

📌 Practical efficiency

`Small / medium datasets → MySQL often feels faster

Large datasets + many indexes → PostgreSQL scales better

Analytics / complex queries → PostgreSQL wins

Simple OLTP workloads → MySQL is excellent`

🧠 One-line intuition

MySQL: “Find index → find PK → find row” Postgres: “Find index → jump straight to row”

DEV Community

How POSTGRES indexing is more efficient than MYSQL

Problem setup (How Index Scan Internally works for both DBs)

Assume:

1️⃣ MySQL (InnoDB) — Step by step

Step-by-step execution

✅ Total cost (MySQL)

🔍 Key MySQL characteristics

2️⃣ PostgreSQL — Step by step

Step-by-step execution

✅ Total cost (PostgreSQL)

🔍 Key PostgreSQL characteristics

3️⃣ Side-by-side comparison (core difference)

4️⃣ Real-world performance (THIS is the key insight)

5️⃣ Index-only scan (Postgres secret weapon 🧠)

6️⃣ Final verdict (short answer)

📌 Practical efficiency

🧠 One-line intuition

Top comments (0)