Materialization in Query Processing

In the previous, we took a brief introduction about materialization and how to evaluate multiple operations of an expression.

Materialization is an easy approach for evaluating multiple operations of the given query and storing the results in the temporary relations. The result can be the output of any join condition, selection condition, and many more. Thus, materialization is the process of creating and setting a view of the results of the evaluated operations for the user query. It is similar to the cache memory where the searched data get settled temporarily. We can easily understand the working of materialization through the pictorial representation of the expression. An operator tree is used for representing an expression.

The materialization uses the following approach for evaluating operations of the given expression:

In the operator tree, we begin from the lowest-level operations (at the bottom of the tree) in the expression. The inputs to the lowest level operations are stored in the form of relations in the database. For example, suppose we want to fetch the name of the student as 'John' from the 'Student' relation.
The relation expression will be:
σ _{name= "John"} (Student)
In this example, there is only one operation of selecting the name from the given relation. Also, this operation is the lowest-level operation. So, we will begin by evaluating this selection operation.
Now, we will use an appropriate algorithm which is suitable for evaluating the operation. Like in our example, we will use an appropriate selection algorithm for retrieving the name from the Student relation.
Then, store the result of the operation in the temporary relations.
We use these temporary relations for evaluating the next-level operation in the operator tree. The result works as an input for every next level up in the tree.
Repeat these steps until all operators at the root of tree will be evaluated, and the final result of the expression will be generated.

We also call the described evaluation as Materialized evaluation because the result of one operation is materialized and used in the evaluation of next operation and so on.

Cost Estimation of Materialized Evaluation

The process of estimating the cost of the materialized evaluation is different from the process of estimating the cost of an algorithm. It is because in analyzing the cost of an algorithm, we do not include the cost of writing the results on to the disks. But in the evaluation of an expression, we not only compute the cost of all operations but also include the cost of writing the result of currently evaluated operation to disk.

To estimate the cost of the materialized evaluation, we consider that results are stored in the buffer, and when the buffer fills completely, the results are stored to the disk.

Let, a total of b_r number of blocks are written. Thus, we can estimate b_r as:

b_r = n_r/f_r.

Here, n_r is the estimated number of tuples in the result relation r and fr is the number of records of relation r that fits in a block. Thus, fr is a blocking factor of the resultant relation r.

With this, we also need to calculate the transfer time by estimating the number of required disks. It is so because the disk head may have moved in-between the successive writes of the block. Thus, we can estimate:

Number of seeks = Γ b_r/ b_bꓶ

Here, b_b defines the size of the output buffer, i.e., measured in blocks.

We can optimize the cost estimation of the materialization process by using the concept of double buffering. Double buffering is the method of using two buffers, where one buffer executes the algorithm continuously, and the other is being written out. It makes the algorithm to execute more fastly by performing CPU activities parallel with I/O activities. We can also reduce the number of seeks by allocating the extra blocks to the output buffer and altogether writing out multiple blocks.

Next TopicMerge Join Algorithm

← prev next →