DEV Community

Mohammad Atif
Mohammad Atif

Posted on • Originally published at mohammad-atif.vercel.app

PostgreSQL Primary Keys-Why Your UUID Choice Matters

Before diving in

So before diving into the topic there are some base information that we need to be aware of.

1. PostgreSQL indexes

PostgreSQL has various types of indexes and different indexes have different strengths and weaknesses. PostgreSQL uses B-Tree index for indexing the primary key.

Important: PRIMARY KEY and UNIQUE constraints create an index by default.

2. What is a B-Tree?

The term “B-Tree” is short for “balanced tree” indicating that the distance between each node and the root is consistent across all levels. Furthermore, the root and its parent nodes can have more than two children, which effectively minimizes the depth of the tree, enhancing search efficiency.

Let’s see the main aspects about B-Trees:

Main characteristics

  • Node Flexibility: Nodes can have more than two children, typically varying from a minimum of two to a maximum defined by the tree order (M).
  • Height Management: Automatically adjusts to maintain a height of logM N, optimizing search operations.
  • Sorting Order: Maintains data in sorted order, ensuring the smallest values are on the left and the largest on the right.
  • Uniformity and Efficiency: All leaf nodes are at the same level to ensure efficiency and consistency, and there are no empty subtrees above the leaf nodes.

What it is best at

  • Totally ordered data
  • Equality, range, sorting

Advantages

  • Supports =, <, <=, >, >=
  • Supports ORDER BY natively
  • Supports UNIQUE / PRIMARY KEY
  • Supports multi-column indexes
  • Can do index-only scans
  • Backward scan supported
  • Very stable performance
  • Balanced tree → predictable depth

Disadvantages

  • Inefficient for:

    • Full-text search
    • Arrays
    • JSON path queries
    • Geometric data
  • Not useful for highly unselective columns (e.g. boolean)

Performance characteristics

  • Lookup: O(log N)
  • Range scan: extremely efficient
  • Insert cost: moderate (page splits)
  • Update cost: moderate
  • Storage: medium

Typical use cases

  • Primary keys
  • Foreign keys
  • Dates & timestamps
  • Numeric ranges
  • ORDER BY + LIMIT queries
  • Pagination

Important observation

Notice any pattern — if we somehow get data that is already sorted we could use that to our advantage as we could predict how the data would be stored thus, performing quick lookups along with other features such as equality, range, and other discussed above.

But at the same time if the data stored has no correlation making these other important functionalities go to waste.

3. Types of unique identifiers and their features

Types of unique identifiers and their features

So if we use a time based UUID we can still avoid collision while leveraging full potential of B-Tree indexing.

Since every newly created UUID would be greater than the previous UUID it would be stored in the rightmost node of the tree which PostgreSQL optimizes for us.

What changes in practice?

Now for lookups both of them would give the result in nearly same time with nearly same index size but the magic happens next.

  • Search object created at a particular day and time from index without using another index on createdAt key.
  • If we need to debloat our data set due to storage cost we can even remove createdAt key.
  • Search objects with time based range.
  • Since all new UUIDs are inserted in rightmost node, insert cost is reduced.
  • Improved VACUUM and bloat behavior.
  • More predictable pagination.

When IDs are time-ordered:


  WHERE id > last_seen_id

Enter fullscreen mode Exit fullscreen mode

Closing thought

In simpler terms time based UUIDs transforms a liability into a useful performance commodity.

Top comments (0)