DoublePipelined Join AlgorithmIn our previous section, we understood about pipelining and the ways through which a system creates and implement pipeline to evaluate multiple operations using demanddriven or producerdriven pipelining. Here, we will discuss an evaluation algorithm for implementing pipelining. There are several operations used in accessing the data from any particular system. But few of them are inherently blocking operations, and others are not. Blocking operations are those which do not output any results until all the input tuples are examined. For example, operations such as hashjoin is a blocking operation as before outputting any result. It needs both its input to be fetched entirely as well as partitioned. On the other hand, the indexed nested loop is able to output the resulting tuples as soon it gets tuples for the outer relation. So, it is pipelined at its outer relation and blocking on its indexed input. It is so because the indexed is created completely before the execution of the indexed nested loop. But in some cases where we want to perform join operation on two inputs. However, both inputs are not already sorted, and we need to put them in a pipeline of the join operation. For such cases, we use an alternative approach known as the Doublepipelined join method. The doublepipelined join technique uses an evaluation algorithm for the implementation of the pipeline, which is known as the Doublepipelined join algorithm. Doublepipelined Join AlgorithmBelow we have described the doublepipelined join algorithm: done_{r} = false; done_{s} = false; r = Ø; s = Ø; result = Ø; while !done_{r} or !done_{s} do begin if queue is empty, wait until it is not empty; t = top entry in the queue; if t = End_{r} then done_{r} = true else if t = End_{s} then done_{s} = true else if t is from input r then begin r = r U {t}; result = result U ({t} ⋈ s); end /* t is from input s */ else begin s = s U {t}; result = result U (r ⋈ {t}); end end The abovedescribed algorithm is performed on two input relations r and s. It is assumed that the input tuples of these relations are pipelined. The tuples which are provided to both r and s relations are queued to process in one queue. In the algorithm, End_{r} and End_{s} are the special queues, which are the endoffile markers. These special queues are inserted in the queue only after generating all the tuples from relation r and s, respectively. Also, as more tuples get added to relations r and s, appropriate indices should be built on both the relations. Keeping the indices upto date leads to an efficient evaluation of the operation. In this algorithm, we have also assumed that both the inputs are fit in memory. But, the doublepipelined join technique also supports the case in which the size of the two inputs exceeds the size of memory, i.e., larger than the memory size. It is because the doublepipelined join method can work as usual until the available memory becomes full. When the memory becomes full, the arrived tuples of both relations r and s upto that point can be treated as being in r_{0} and s_{0} partitions, respectively. The tuples which have subsequently arrived for relations r and s are assigned to partitions r_{1} and s_{1}. Although these assigned tuples to partitions r_{1} and s_{1} are not included to the inmemory index, they are written to the disk. Also, before writing these tuples assigned to r_{1} and s_{1} to the disk, they are used to probe partitions r_{0} and s_{0}. As a result, it also concludes the join of r_{1} with s_{0} and s_{0} with r_{1} in a pipeline. After processing both relations r and s completely, we compute the join of r_{1} tuples with s_{1} tuples in order to complete the join operation. Also, we can use any join operation or method for performing join on partition r_{1} with s_{1} partition. Note: If we implement pipeline by using hash indices on any relations r and s, such a method is known as Doublepipelined hash join method.
Next TopicDBMS Tutorial
