Pages

Wednesday, June 4, 2008

Join algorithms

Three fundamental algorithms exist for performing a join operation.

Nested loops
Use of nested loops produces the simplest join-algorithm. For each tuple in the outer join relation, the system scans the entire inner-join relation and appends any tuples that match the join-condition to the result set. Naturally, this algorithm performs poorly with large join-relations: inner or outer or both. An index on columns in the inner relation in the join-predicate can enhance performance.

The "block nested loops" (BNL) approach offers a refinement to this technique: for every block in the outer relation, the system scans the entire inner relation. For each match between the current inner tuple and one of the tuples in the current block of the outer relation, the system adds a tuple to the join result-set. This variant means doing more computation for each tuple of the inner relation, but far fewer scans of the inner relation.


Merge join
If both join relations come in order, sorted by the join attribute(s), the system can perform the join trivially, thus:

Consider the current "group" of tuples from the inner relation; a group consists of a set of contiguous tuples in the inner relation with the same value in the join attribute.
For each matching tuple in the current inner group, add a tuple to the join result. Once the inner group has been exhausted, advance both the inner and outer scans to the next group.
Merge joins offer one reason why many optimizers keep track of the sort order produced by query plan operators—if one or both input relations to a merge join arrives already sorted on the join attribute, the system need not perform an additional sort. Otherwise, the DBMS will need to perform the sort, usually using an external sort to avoid consuming too much memory.

Hash join

A hash join algorithm can produce equi-joins. The database system pre-forms access to the tables concerned by building hash tables on the join-attributes. The lookup in hash tables operates much faster than through index trees. However, one can compare hashed values only for equality, not for other relationships.

No comments: