Advanced Query Optimization

There are various topics which lead the query optimization to its advanced level.

In this section, we will discuss a few of such topics.

Top-K Optimization

A database system is used for fetching data from it. But there are some queries that the user has given that access results sorted on some attributes and require only top K results for some K. Also, some queries support bound K, or limit K clause, which accesses the top K results. But, some queries do not support the bound K. For such queries, the optimizer specifies a hint that indicates the results of the query retrieved should be the top k results only. It does not matter if the query generates more number of results, including the top k results. In cases, the value of K is small, and then also, if the query optimization plan produces the entire set of results, sorts them and generates the top K results. Such a step is not at all effective as well as inefficient as it may likely to discard most of the computed intermediate results. Therefore, we use several methods to optimize such top-k queries.

Two such methods are:

  • Using pipelined query evaluation plans for producing the results in sorted order.
  • Estimating the highest value on the sorted attributes that will appear in the top K result and introducing the selection predicates used for eliminating the larger values.

Any how, some extra tuples are generated beyond the top-K results. Such tuples are discarded, and if too few tuples are generated that do not reach the top K results, then we need to execute the query again, and also, there is a need to change the selection condition.

Join Minimization

There are different join operations used for processing the given user query. In cases, when queries are generated through views, then for computing the query, it is required to join more number of relations than the actual requirement. To resolve such cases, we need to drop such relations from a join. Such type of solution or method is known as Join Minimization. We have discussed only one of such cases. There are also more numbers of similar cases, and we can apply the join minimization there also.

Optimization of Updates

An update query is used to make changes in the already persisted data. An update query often involves subqueries in the set as well as where clauses. So, while optimizing the update, both these subqueries must also get included. For example, if a user wants to update the score as 97 of a student in a student table whose roll_no is 102. The following update query will be used:

update student set score = 97 where roll_no = 102

However, if the updates involve a selection on the updated column, we need to handle such updations carefully. If the update is done during the selection performed by an index scan, then we need to re-insert an updated tuple in the index ahead of the scan. Also, several problems can arise in the updation of the subqueries, whose result is affected by the update.

Halloween Problem

The problem was named so because it was first identified on Halloween Day at IBM. The problem of an update that affects execution of a query associated with the update, known as the Halloween problem. But, we can avoid this problem by breaking up the execution plan by executing the following steps:

  • Executing the queries that define the update first
  • Creating a list of affected tuples
  • At last, updating the tuples and indices.

Thus, following these steps increases the execution cost of the query evaluation plan.

We can optimize the update plans by checking if the Halloween problem can occur, and if it cannot occur, perform the update during the processing of the query. It, however, reduces the update overheads. We can understand this with an example, suppose that Halloween's problem cannot occur if the index attributes are not affected by the updates. However, if it does and if the updates also decrease the value, even if the index is scanned in increasing order, in that case, it will not encounter the updated tuples again during the scan process. But in such cases, it can update the index even the query is being executed. Thus, it will reduce the overall cost and lead to an optimized update.

Another method of optimizing such update queries that result or concludes a large number of updates is by collecting all the updates as a batch. After collecting, apply these updates batch separately to each affected index. But, before applying an updates batch to an index, it is required to sort the batch in index order for that index. Thus, such sorting of the batch will reduce the amount of random I/O, which are needed to update the indices at a great height.

Therefore, we can perform such optimization of updates in most of the database systems.

Multi Query Optimization and Shared Scans

We can understand the multi-query optimization as when the user submits a queries batch. The query optimizer exploits common subexpressions between different queries. It does so to evaluate them once and reuse them whenever required. Thus, for complex queries also, we can exploit the subexpression, and it reduces the cost of the query evaluation plan, consequently. So, we need to optimize the subexpressions for different queries. One way of optimization is the elimination of the common subexpression, known as Common subexpression elimination. The common subexpression elimination method optimizes the subexpressions by computing and storing the result. Further, reusing the result whenever the subexpressions occur. Only a few databases perform the exploitation of common subexpressions among the evaluation plans, which are selected for each of the batches of queries.

In some database systems, another form of multi-query optimization is implemented. Such a form of implementation is known as Sharing of relation scans between the queries. Understand the following steps to know the working of the Shared-scan:

  • It does not read the relation in a repeated manner from the disk.
  • It reads data only once from the disk for one time for every query which needs to scan a relation.
  • Finally, it pipelines to each of the queries.

Such a method of shared-scan optimization is useful when multiple queries perform a scan on a fact table or a single large relation.

Parametric Query Optimization

In the parametric query optimization method, query optimization is performed without specifying its parameter values. The optimizer outputs several optimal plans for different parametric values. It outputs the plan only if it is optimal for some possible parameter values. After this, the optimizer stores the output set of alternative plans. Then the cheapest plan is found and selected. Such selection takes very less time than the re-optimization process. In this way, the optimizer optimizes the parameters and leads to an optimized and cost-effective output.