DEV Community

Cover image for How POSTGRES indexing is more efficient than MYSQL
Khairul Basar
Khairul Basar

Posted on

How POSTGRES indexing is more efficient than MYSQL

Problem setup (How Index Scan Internally works for both DBs)

Table in MySQL and PostgreSQL:

user(
  id   BIGINT PRIMARY KEY,
  name VARCHAR(100),
  age  INT
)
Enter fullscreen mode Exit fullscreen mode

Index:

CREATE INDEX idx_user_name ON user(name);

Enter fullscreen mode Exit fullscreen mode

Query:

SELECT age FROM user WHERE name = 'Rahim';

Enter fullscreen mode Exit fullscreen mode

Assume:

  • Table has 10 million rows
  • Index is a B-Tree
  • name is not unique
  • age is not in the index

1️⃣ MySQL (InnoDB) — Step by step

🔑 Important InnoDB rule

Secondary index stores the PRIMARY KEY, not the row location

So idx_user_name looks like:

(name, primary_key_id)


Step-by-step execution

🧭 Step 1: Traverse secondary index (name)

MySQL searches the B-Tree for 'Rahim'.

Cost: O(log N)

Example:

B-Tree levels:
Root → Internal → Leaf

At the leaf, it finds:

('Rahim', id=73482)

⚠️ It does NOT have age yet

🧭 Step 2: Traverse PRIMARY KEY index (clustered index)

In InnoDB:

Primary key index IS the table

Rows are physically stored in PK order

Now MySQL uses id = 73482 to search the PK B-Tree.

Cost: O(log N)

At the PK leaf node, it finds the full row:

(id=73482, name='Rahim', age=29)


✅ Total cost (MySQL)

🔍 Key MySQL characteristics

  • ✅ Clustered index = good locality
  • ❌ Two tree traversals
  • ❌ PK lookup cost grows with table size

2️⃣ PostgreSQL — Step by step

🔑 Important PostgreSQL rule

Index stores a TID (tuple ID) → pointer to heap location

Index entry looks like:

(name, (block_id, offset))

Step-by-step execution

🧭 Step 1: Traverse B-Tree index (name)

PostgreSQL searches index for 'Rahim'.

Cost: O(log N)

At the leaf, it finds:

('Rahim', TID=(block=102345, offset=7))

🧭 Step 2: Heap fetch (direct pointer)

Postgres now:

Goes directly to heap page 102345

Reads row at offset 7

Cost: O(1) (conceptually)

Row found:

(id=73482, name='Rahim', age=29)

✅ Total cost (PostgreSQL)

🔍 Key PostgreSQL characteristics

  • ✅ One B-Tree traversal
  • ✅ Heap fetch is constant-time
  • ❌ Heap pages may be scattered (less locality)
  • ❌ Extra visibility checks (MVCC)

3️⃣ Side-by-side comparison (core difference)


4️⃣ Real-world performance (THIS is the key insight)

⚠️ Big-O is not the full story

When MySQL can be faster

  • PK is small
  • Data fits in buffer pool
  • Rows are accessed sequentially
  • Heavy read workloads

➡️ Clustered index wins

When PostgreSQL can be faster

  • Very large tables
  • Many secondary indexes
  • Random access patterns
  • Index-only scans possible

➡️ Pointer-based access wins


5️⃣ Index-only scan (Postgres secret weapon 🧠)

If you change index to:

CREATE INDEX idx_user_name_age ON user(name, age);

Enter fullscreen mode Exit fullscreen mode

Then:

SELECT age FROM user WHERE name = 'Rahim';

Enter fullscreen mode Exit fullscreen mode

🔥 PostgreSQL can return result WITHOUT touching heap at all

Cost:
O(log N)

MySQL cannot do true index-only scan in the same way because:
Secondary index does not contain row version visibility info


6️⃣ Final verdict (short answer)

📌 Theoretical efficiency

✅ PostgreSQL is more efficient

`O(logN) + O(1)

vs

O(logN) + O(logN)`

📌 Practical efficiency

`Small / medium datasets → MySQL often feels faster

Large datasets + many indexes → PostgreSQL scales better

Analytics / complex queries → PostgreSQL wins

Simple OLTP workloads → MySQL is excellent`


🧠 One-line intuition

MySQL: “Find index → find PK → find row”
Postgres: “Find index → jump straight to row”

Top comments (0)