Double-Pipelined Join AlgorithmIn our previous section, we understood about pipelining and the ways through which a system creates and implement pipeline to evaluate multiple operations using demand-driven or producer-driven pipelining. Here, we will discuss an evaluation algorithm for implementing pipelining. There are several operations used in accessing the data from any particular system. But few of them are inherently blocking operations, and others are not. Blocking operations are those which do not output any results until all the input tuples are examined. For example, operations such as hash-join is a blocking operation as before outputting any result. It needs both its input to be fetched entirely as well as partitioned. On the other hand, the indexed nested loop is able to output the resulting tuples as soon it gets tuples for the outer relation. So, it is pipelined at its outer relation and blocking on its indexed input. It is so because the indexed is created completely before the execution of the indexed nested loop. But in some cases where we want to perform join operation on two inputs. However, both inputs are not already sorted, and we need to put them in a pipeline of the join operation. For such cases, we use an alternative approach known as the Double-pipelined join method. The double-pipelined join technique uses an evaluation algorithm for the implementation of the pipeline, which is known as the Double-pipelined join algorithm. Double-pipelined Join AlgorithmBelow we have described the double-pipelined join algorithm: doner = false; dones = false; r = Ø; s = Ø; result = Ø; while !doner or !dones do begin if queue is empty, wait until it is not empty; t = top entry in the queue; if t = Endr then doner = true else if t = Ends then dones = true else if t is from input r then begin r = r U {t}; result = result U ({t} ⋈ s); end /* t is from input s */ else begin s = s U {t}; result = result U (r ⋈ {t}); end end The above-described algorithm is performed on two input relations r and s. It is assumed that the input tuples of these relations are pipelined. The tuples which are provided to both r and s relations are queued to process in one queue. In the algorithm, Endr and Ends are the special queues, which are the end-of-file markers. These special queues are inserted in the queue only after generating all the tuples from relation r and s, respectively. Also, as more tuples get added to relations r and s, appropriate indices should be built on both the relations. Keeping the indices upto date leads to an efficient evaluation of the operation. In this algorithm, we have also assumed that both the inputs are fit in memory. But, the double-pipelined join technique also supports the case in which the size of the two inputs exceeds the size of memory, i.e., larger than the memory size. It is because the double-pipelined join method can work as usual until the available memory becomes full. When the memory becomes full, the arrived tuples of both relations r and s upto that point can be treated as being in r0 and s0 partitions, respectively. The tuples which have subsequently arrived for relations r and s are assigned to partitions r1 and s1. Although these assigned tuples to partitions r1 and s1 are not included to the in-memory index, they are written to the disk. Also, before writing these tuples assigned to r1 and s1 to the disk, they are used to probe partitions r0 and s0. As a result, it also concludes the join of r1 with s0 and s0 with r1 in a pipeline. After processing both relations r and s completely, we compute the join of r1 tuples with s1 tuples in order to complete the join operation. Also, we can use any join operation or method for performing join on partition r1 with s1 partition. Note: If we implement pipeline by using hash indices on any relations r and s, such a method is known as Double-pipelined hash join method.Next TopicDBMS Tutorial |