Analytical Functions in SQL
Analytical functions allow the user to calculate an aggregate value depending on the grouping of rows. These functions differ from the aggregate functions as they return multiple rows for each group. The rows returned are known as windows that the analytic_clause can define.
For every row in the database, a sliding window is defined. The sliding window determines the limit of rows that will be used to compute the values for the current row. The size of the sliding window can either be a determined number of rows or may depend on logical intervals such as time.
Analytical functions are the last set of operations performed in a query. The only exception is the ORDER BY clause used in the SQL. All joins and all WHERE, GROUP BY, and HAVING clauses are performed before implementing the analytical functions. Therefore, the analytical functions can only be implemented with the select list or when the ORDER BY clause is used.
Analytical functions commonly calculate cumulative, moving, centered, and reporting aggregates.
The Syntax for Analytical Functions in SQL
The syntax to implement the analytical functions in the SQL is as follows:
Components in the Analytical Function
To implement the analytic function in the query, specify the name of the functions you want to execute. You can refer to the list of analytic functions to determine the function to be implemented.
An analytical function may take 0-3 argument values. The arguments entered in the functions can be numerical values or non-numeric data types that can be changed into numeric data types.
The SQL software determines which argument has the highest numeric precedence when multiple arguments are passed in the analytic functions. Then it converts the arguments left to the particular data type. The value returned by the function also has the same data type of argument with the highest precedence. This can be changed if the user specifies another data type for the particular function.
OVER is an analytic_clause the user uses to indicate that the analytical function mentioned is implemented on a query result set. Once the FROM, WHERE, GROUP BY, and HAVING clauses are executed, this clause is implemented. The user can specify the analytic functions with an OVER clause in the list or using the ORDER BY clause. These clauses enable the user to filter the results generated by implementing the analytic function. It allows one to nest these functions in a parent query and then filter the results of the nested subqueries in the SQL.
Notes on the analytic_clause:
Another type of analytic clause is PARTITION BY clause. This clause is used to partition the query result set into groups based on one or multiple value expressions. If the user decides to remove this clause from the query, then the analytic function will treat all the query result rows as one group.
If the user wants to implement the query_partition_clause in the analytic function, then the user must execute the upper branch of the syntax (without needing the parenthesis). If the user wants to implement the clause in a model query or a partitioned outer join, use the syntax with the parenthesis.
It allows the user to implement several analytical functions in the same query. This can be done by using the same or different PARTITION BY keys. If the objects being queried have a parallel attribute, the user can also mention an analytical function using the query_partition_clause. This will also make the computations performed parallel as well.
Valid values that can be used in the value_expr are constants, columns, non-analytic functions, function expressions, or expressions that contain any of those mentioned above.
This clause is used in the analytical functions to mention the order in which the data will be arranged within the partition. The user can arrange the values within the partition on multiple keys for each analytic function. Each value passed in the analytical function is defined by a value_expr, and an ordering sequence qualifies the expression.
For different analytical functions, the user can mention multiple ordering expressions. This is especially useful when the user implements functions that rank values as the second expression can be used to resolve the ties between identical values for the first expression.
If the order_by_clause results in the same values for several rows, then the analytical function will behave as follows:
The following analytical functions will provide the same results for each of the rows:
ROW_NUMBER assigns a different value to each row if there any two or more rows have the same value after implementing the order_by_clause. This different value is computed by the order in which the system processes the row. This method may be non-deterministic if the ORDER BY does not ensure a total ordering.
The result can differ for all other analytic functions depending on the window specifications. The user can specify a logical window by implementing the RANGE keyword. This will allow all the analytical functions to provide the same result for each row, even with different window specifications. If the user specifies the physical window with the ROW keyword, the result will be non-deterministic.
Limitation on the Order By Clause
Certain limitations are imposed on the implementation of the ORDER BY clause:
When the order_by_clause is implemented along with an analytical function, an expression (expr) must be passed to the clause. The SIBLINGS keyword is invalid (it could only be used in a hierarchical query). The position and column aliases (c_alias) are also invalid in this scenario. Otherwise, this will return the same result as the order_by_clause used to order the overall query or subquery.
An analytic function that implements the RANGE keyword can use multiple sorting keys in the order by clause if it mentions any of the mentioned below windows:
These four windows can only have multiple sorting keys in the order_by_clause. All the other boundaries can only have one sort key in the ORDER BY clause of the analytic function. This limitation does not apply to the window boundaries determined by the ROW keyword.
ASC | DESC
These keywords are used to specify the ordering sequence. The order can either be ascending or descending. By default, the ordering sequence is set to ASC.
NULLS FIRST | NULLS LAST
It determines whether returned rows with null values should appear in the ordering sequence at the beginning or last. While the NULLS LAST is set as the default value when in the ascending order, and the NULLS FIRST is the default value for the descending order.
Analytical functions are only operated on the rows in the order determined by the order_by_clause of the function. Since the order_by_clause of the function does not ensure the order of the result produced. The user can implement the order_by_clause of the query to ensure that the final result is ordered.
Some analytic functions enable the windowing_clause. Some of the analytical function that allows the user to use the windowing_clause are as follows:
ROWS | RANGE
BETWEEN ... AND
These clauses are used to determine a particular starting point and the ending point for the window. The first expression entered before AND specifies the beginning point for the window, and the second expression entered after the AND specifies the ending point for the window.
Suppose the user does not use BETWEEN keyword and only mentions the ending point for the window. Then the SQL will treat the point as the beginning point, and the ending point of the window will be set as the current row.
The user uses the UNBOUNDED PRECEDING to specify that the window begins with the first row of the partition. This can only be used as the beginning specification and not for the ending point specification.
The user uses the UNBOUNDED FOLLOWING to specify that the window ends with the last row of the partition. This can only be used as the ending specification and not as the beginning point specification.
When the user uses the CURRENT ROW as a starting point, then it specifies that the window will start from the current row or value. This will also depend on whether the user has mentioned the ROW or RANGE. If specified, the window's ending point cannot be the value_expr PRECEDING.
When the user uses the CURRENT ROW as an endpoint, then it specifies that the window will end from the current row or value. This will also depend on whether the user has mentioned the ROW or RANGE. If specified, the window's beginning point cannot be the value_expr FOLLOWING.
value_expr PRECEDING or value_expr FOLLOWING
For RANGE or ROW:
Some Common Analytical Functions
Some of the analytical functions and the functions performed by them are as follows: