Materialization in Query Processing
In the previous, we took a brief introduction about materialization and how to evaluate multiple operations of an expression.
Materialization is an easy approach for evaluating multiple operations of the given query and storing the results in the temporary relations. The result can be the output of any join condition, selection condition, and many more. Thus, materialization is the process of creating and setting a view of the results of the evaluated operations for the user query. It is similar to the cache memory where the searched data get settled temporarily. We can easily understand the working of materialization through the pictorial representation of the expression. An operator tree is used for representing an expression.
The materialization uses the following approach for evaluating operations of the given expression:
We also call the described evaluation as Materialized evaluation because the result of one operation is materialized and used in the evaluation of next operation and so on.
Cost Estimation of Materialized Evaluation
The process of estimating the cost of the materialized evaluation is different from the process of estimating the cost of an algorithm. It is because in analyzing the cost of an algorithm, we do not include the cost of writing the results on to the disks. But in the evaluation of an expression, we not only compute the cost of all operations but also include the cost of writing the result of currently evaluated operation to disk.
To estimate the cost of the materialized evaluation, we consider that results are stored in the buffer, and when the buffer fills completely, the results are stored to the disk.
Let, a total of br number of blocks are written. Thus, we can estimate br as:
br = nr/fr.
Here, nr is the estimated number of tuples in the result relation r and fr is the number of records of relation r that fits in a block. Thus, fr is a blocking factor of the resultant relation r.
With this, we also need to calculate the transfer time by estimating the number of required disks. It is so because the disk head may have moved in-between the successive writes of the block. Thus, we can estimate:
Number of seeks = Γ br/ bbꓶ
Here, bb defines the size of the output buffer, i.e., measured in blocks.
We can optimize the cost estimation of the materialization process by using the concept of double buffering. Double buffering is the method of using two buffers, where one buffer executes the algorithm continuously, and the other is being written out. It makes the algorithm to execute more fastly by performing CPU activities parallel with I/O activities. We can also reduce the number of seeks by allocating the extra blocks to the output buffer and altogether writing out multiple blocks.