Nested-Loop Join Algorithm

In our previous section, we learned about joins and various types of joins. In this section, we will know about the nested loop join algorithm.

A nested loop join is a join that contains a pair of nested for loops. To perform the nested loop join i.e., θ on two relations r and s, we use an algorithm known as the Nested loop join algorithm. The computation takes place as:

r ⋈ θ s

where r is known as the outer relation and s is the inner relation of the join. It is because the for loop of r encloses the for loop of s.

Nested-Loop Join Algorithm

Let's discuss the algorithm of nested-loop join:

for each tuple tr in r do begin
        for each tuple ts in s do begin
              test pair (tr, ts) to test if they satisfy the given join condition ?
              if test satisfied
                 add tr . ts to the result;
      end inner loop
end outer loop

In the algorithm, tr and ts are the tuples of relations r and s, respectively. The notation tr. ts is a tuple constructed by concatenating the attribute values of tuples tr and ts.

With the help of the algorithm, we understood the following points:

  • The nested-loop join does not need any indexing similar to a linear file scan for accessing the data.
  • Nested-loop join does not care about the given join condition. It is suitable for each given join condition.
  • The nested-loop join algorithm is expensive in nature. It is because it computes and examines each pair of tuples in the given two relations.

Cost Analysis of Nested-loop Join Algorithm

For analyzing the cost of the nested-loop join algorithm, consider a number of pairs of tuples as nr * ns. Here, nr specifies the number of tuples in relation r and ns specifies the number of tuples in relation s. For computing the cost, perform a complete scan on relation s. Thus,

Total number of block transfers in worst case = nr * bs + br

Total number of seeks required in worst case = nr + br

Here, bs and br are the number of blocks holding the tuples of relation r and s, respectively.

In the best case, both relations r and s have sufficient memory to fit in the memory simultaneously. So, each block will read only once. Thus,

Total number of block transfers in best case = br + bs.

Total number of seeks required = 2(nr + br)

In case if any one of the relations given fits entirely into the memory, it is a must to use that relation as the inner relation. It is because we will read the inner relation only once. Thus,

Total number of block transfers in such case = br + bs

Total number of seeks required = 2(nr + br)

Block Nested-Loop Join

Block Nested-Loop Join is a variant of nested-loop join in which each block of the inner relation is paired with each block of the outer relation. The block nested-loop join saves major block access in a situation where the buffer size is small enough to hold the entire relation into the memory. It does so by processing the relations on the basis of per block rather on the basis of per tuple. Within each pair of blocks, the block nested-loop join pairs each tuple of one block with each tuple in the other block to produce all pairs of tuples. It pairs only those tuples that satisfy the given join condition and them to the result.

Block Nested-Loop Join Algorithm

The algorithm that is used to perform the block nested-loop join is known as the Block Nested-Loop Join algorithm. We will use the same relations r and s in this algorithm.

           for each block br of r do begin
             for each block bs of s do begin
               for each tuple tr in br do begin
                 for each tuple ts in bs do begin
                   test pair (tr, ts) to determine if they pass the given join condition
                    if test passed
                   add tr . ts to the result;
             end
           end
        end
   end

Cost Analysis of Block Nested-Loop Join Algorithm

There is a major difference between the cost of block nested-loop join and nested loop-join algorithm. In the worst case of block nested-loop join, each block in the inner relation s is read only for one time for each block in the outer relation r. On the other hand, the nested-loop join reads each tuple in the inner relation s for one time for each tuple in the outer relation r. Thus in block nested-loop join,

Total number of block transfers in worst case = br * bs+ br

Total number of seeks required = 2 * br

Here, br and bs are the number of blocks holding records of the given relation r and s, respectively. Also, each scan of s (inner relation) needs only one seek, and r (outer relation) requires one seek per block. In the best case, the inner relation fits entirely into memory. Thus,

Total number of block transfers in best case = br + bs

Total number of seeks required = 2(nr + br)

In the case where none of the given relations r and s fits entirely into the memory, it is efficient to use the inner relation i.e., s as the outer relation.

Improving Performance of Nested-Loop and Block Nested-Loop Joins

After understanding both joins, it was evaluated that the performance of both the joins can be further improved:

  1. If in an equijoin or natural join, the join attributes form a key on the given inner relation s, then as soon the first match is found, the inner loop terminates for each outer relation tuple.
  2. Instead of using disk blocks in the block nested-loop join algorithm, we can use the biggest size that could fit into memory and also leave enough space for the buffers of the inner relation s and its output. As a result, it will reduce the number of scans of the inner relation and also minimizes the cost.
  3. We can perform scanning of the inner loop in both forward and backward direction in an alternate manner. This approach reduces the number of requirements of disk accesses by keeping the requests for disk blocks in order. Ordering of the request also helps to reuse the remaining data left in the buffer after its previous scan.
  4. If an index is present on the join attribute of the inner loop, we can replace the file scans with highly efficient index lookups. Such type of join method is known as indexed nested-loop join. Indexed nested-loop join can be used either with existing indexes or temporary indexes, which are created for the evaluation of the joins. It is an optimized technique for improving the performance of the nested-loop join.