In Relational databases, combining data from multiple tables is done by Join operations which play a very important function in this regard. PostgreSQL provides a variety of Join Operations to take care of different join conditions. In this blog post which is a summary of the book The Internals of PostgreSQL, Chapter 3 Part 3, we will discuss three important join operations in PostgreSQL:
Nested Loop Join
Materialized Nested Loop Join
Indexed Nested Loop Join
Nested Loop Join:
In PostgreSQL, the nested loop join acts as the fundamental join operation, which is able of dealing with any kind of join conditions despite the fact that there are many other efficient variations.
The nested loop join is adaptable for many join conditions and does not need a start-up operation. Multiplying the sizes of the inner and outer tables is directly proportional to the run cost. The run cost equation takes into consideration the tuple costs, CPU costs, and scanning costs of inner and outer tables.
Materialized Nested Loop Join:
As scanning the whole inner table for each outer table tuple is an expensive process, PostgreSQL offers materialized nested loop join to reduce the collective scanning cost of the inner table. This variation scans the inner table only one time and the executor writes the inner table tuples in work_memory or temporary files.
- Eliminating the need of scanning the inner table for each tuple of the outer table, Materialized nested loop join reduces the overall scanning cost.
- Before carrying out the join operation, the executor writes the inner table tuples in memory or temporary files.
- Temporary tuple storage module for materializing tables, creating batches, and much more is internally offered by PostgreSQL.
- To give an exact and accurate estimation cost of the whole join operation, materialized nested loop join cost estimation takes in variables like start-up cost, run cost, and rescan cost.
Indexed Nested Loop Join:
When there is an index of the inner table offering the straightforward look-up of the tuples satisfying the join condition for matching each tuple of the outer table, the variation is known as indexed nested loop join.
This variation offers a process on the basis of a single loop of the outer table, which enhances the performance of join operations.
- Indexed Nested Loop Join uses the index search on the inner table tuples instead of the sequential scanning fulfilling the join condition.
- Indexed Nested Loop Join increases the join operation performance as it directly looks up for tuples, this is crucial when managing large datasets.
- Indexed Nested Loop Join cost estimation takes in the variables like lookup costs and scanning cost of the outer table which ensures accurate join operation’s cost.
Conclusion
To ensure optimal query processing in PostgreSQL, efficient join operations are important. Understanding the join operations help users such as developers, db administrators to choose most suitable approach for their specific use case.
To conclude this blog we discussed various join operations each of which offers uniques properties and its advantages, ensuring efficient data processing and using these join operations with PostgreSQL makes it more efficient.
NOTE: More in the next blog
Top comments (2)
Nice blog Suhaib!
Thank you.