Implementation of Pipelining

For implementing a pipeline in order to evaluate multiple operations of the given user query, we need to construct a single and complex operation that merges the multiple operations of the given query, which will implement a pipeline. However, such an approach is feasible and efficient for some frequently occurring conditions.

The system can use any of the following ways for executing a pipeline:

Demand-driven Pipeline

In the demand-driven pipeline, the system repeatedly makes tuples request from the operation, which is at the top of the pipeline. Whenever the operation gets the system request for the tuples, initially, it computes those next tuples which will be returned, and after that, it returns the requested tuples. The operation repeats the same process each time it receives any tuples request from the system. In case, the inputs of the operation are not pipelined, then we compute the next returning tuples from the input relations only. However, the system keeps track of all tuples which have been returned so far. But if there are some pipelined inputs present, the operation will make a request for tuples from its pipelined inputs also. After receiving tuples from its pipelined inputs, the operation uses them for computing tuples for its output or result and then passes them to its parent which is at the upper-level. So, in the demand-driven pipeline, a pipeline is implemented on the basis of the demand or request of tuples made by the system.

Implementing demand-driven pipeline

In the demand-driven pipeline, it implements each operation as an iterator. The iterator provides three basic functions to implement a demand-driven pipeline. The functions are open(), next(), and close(). These functions work as follows:

After invoking the open() function, each call to next() returns the next tuple as an output of the operation.
In turn, the implementation of the operation invokes the open(), and next() functions on its inputs so that the input tuples may be easily available when needed.
After fulfilling the requirements, the close() function tells the iterator that there is no more tuple requirement.
Also, in-between the calling process or calls, the iterator maintains its state of execution. As a result, the successive next() function receives tuples of the successive result.

Producer-driven Pipeline

The producer-driven pipeline is different from the demand-driven pipeline. In the producer-driven pipeline, the operations do not wait for the system request for producing the tuples. Instead, the operations are eager to produce such tuples. In the producer-driven pipeline, it models each operation as a separate thread or process within the system. Here, the system gets a stream of tuples from its pipelined inputs and finally generates or produces a stream of tuples for its output. The producer-driven pipeline follows such an approach.

Implementing Producer-driven Pipeline

The way of implementing the producer-driven pipeline varies from demand-driven pipeline. The implementation processes in the following described steps:

For each pair of adjacent operations, the system constructs a buffer that holds the tuples which are being passed from one operation to the next operation.
After creating the buffer, the processes which are corresponding to different operations are concurrently executed.
All those operations which are present at the bottom of the pipeline continually produce the output tuples put them in the output buffer until the buffer becomes full.
As soon the operation uses a tuple from a pipelined input, it removes that tuple from its input buffer.
In case the output buffer becomes full, the operation waits until the buffer creates more space for more tuples. What happens, the parent operation of the specified operation is responsible for removing the tuples form the buffer. So, in actuality, the operation waits for its parent operation to do so.
So, when the buffer creates more space again, the operation restarts its tuples production and continues until the buffer becomes full again.
The operation repeats this process until the generation of all the output tuples.

Note: It becomes necessary for the system to switch operations if an input buffer is empty, the output buffer is full, or when it needs more input tuples for generating more output tuples.

Difference between Producer-driven pipeline and Demand-driven pipeline

There are the following difference points between the demand-driven pipeline and producer-driven pipeline:

Demand-driven Pipeline	Producer-driven Pipeline
It is similar to pulling data up from the top of an operation tree.	It is similar to pushing data up from the below of an operation tree.
Tuples are generated in a lazy manner.	Tuples are eagerly generated.
It is easy to implement.	It is not so easy to implement a producer-driven pipeline.
It is most commonly used for evaluating an expression.	It is typical so rarely used in the systems. But, it is good for systems such as parallel processing systems.