Analytical Functions in SQL

Analytical functions allow the user to calculate an aggregate value depending on the grouping of rows. These functions differ from the aggregate functions as they return multiple rows for each group. The rows returned are known as windows that the analytic_clause can define.

For every row in the database, a sliding window is defined. The sliding window determines the limit of rows that will be used to compute the values for the current row. The size of the sliding window can either be a determined number of rows or may depend on logical intervals such as time.

Analytical functions are the last set of operations performed in a query. The only exception is the ORDER BY clause used in the SQL. All joins and all WHERE, GROUP BY, and HAVING clauses are performed before implementing the analytical functions. Therefore, the analytical functions can only be implemented with the select list or when the ORDER BY clause is used.

Analytical functions commonly calculate cumulative, moving, centered, and reporting aggregates.

The Syntax for Analytical Functions in SQL

The syntax to implement the analytical functions in the SQL is as follows:

Components in the Analytical Function

Analytic function

To implement the analytic function in the query, specify the name of the functions you want to execute. You can refer to the list of analytic functions to determine the function to be implemented.

Arguments

An analytical function may take 0-3 argument values. The arguments entered in the functions can be numerical values or non-numeric data types that can be changed into numeric data types.

The SQL software determines which argument has the highest numeric precedence when multiple arguments are passed in the analytic functions. Then it converts the arguments left to the particular data type. The value returned by the function also has the same data type of argument with the highest precedence. This can be changed if the user specifies another data type for the particular function.

Analytic_Clause

OVER is an analytic_clause the user uses to indicate that the analytical function mentioned is implemented on a query result set. Once the FROM, WHERE, GROUP BY, and HAVING clauses are executed, this clause is implemented. The user can specify the analytic functions with an OVER clause in the list or using the ORDER BY clause. These clauses enable the user to filter the results generated by implementing the analytic function. It allows one to nest these functions in a parent query and then filter the results of the nested subqueries in the SQL.

Notes on the analytic_clause:

The user cannot nest the analytic functions by mentioning another analytic function in any part of the analytic _clause. But you can mention an analytical function in a subquery and calculate another analytical function over it.
You can mention OVER analytic_clause with user-defined analytical functions and predefined analytical functions.
PARTITION BY and ORDER BY are two clauses in the analytic clause that is collation sensitive.

query_partition_clause

Another type of analytic clause is PARTITION BY clause. This clause is used to partition the query result set into groups based on one or multiple value expressions. If the user decides to remove this clause from the query, then the analytic function will treat all the query result rows as one group.

If the user wants to implement the query_partition_clause in the analytic function, then the user must execute the upper branch of the syntax (without needing the parenthesis). If the user wants to implement the clause in a model query or a partitioned outer join, use the syntax with the parenthesis.

It allows the user to implement several analytical functions in the same query. This can be done by using the same or different PARTITION BY keys. If the objects being queried have a parallel attribute, the user can also mention an analytical function using the query_partition_clause. This will also make the computations performed parallel as well.

Valid values that can be used in the value_expr are constants, columns, non-analytic functions, function expressions, or expressions that contain any of those mentioned above.

Order_by_clause

This clause is used in the analytical functions to mention the order in which the data will be arranged within the partition. The user can arrange the values within the partition on multiple keys for each analytic function. Each value passed in the analytical function is defined by a value_expr, and an ordering sequence qualifies the expression.

For different analytical functions, the user can mention multiple ordering expressions. This is especially useful when the user implements functions that rank values as the second expression can be used to resolve the ties between identical values for the first expression.

If the order_by_clause results in the same values for several rows, then the analytical function will behave as follows:

The following analytical functions will provide the same results for each of the rows:

CUME_DIST
DENSE_RANK
NTILE
PERCENT_RANK
RANK

ROW_NUMBER assigns a different value to each row if there any two or more rows have the same value after implementing the order_by_clause. This different value is computed by the order in which the system processes the row. This method may be non-deterministic if the ORDER BY does not ensure a total ordering.

The result can differ for all other analytic functions depending on the window specifications. The user can specify a logical window by implementing the RANGE keyword. This will allow all the analytical functions to provide the same result for each row, even with different window specifications. If the user specifies the physical window with the ROW keyword, the result will be non-deterministic.

Limitation on the Order By Clause

Certain limitations are imposed on the implementation of the ORDER BY clause:

When the order_by_clause is implemented along with an analytical function, an expression (expr) must be passed to the clause. The SIBLINGS keyword is invalid (it could only be used in a hierarchical query). The position and column aliases (c_alias) are also invalid in this scenario. Otherwise, this will return the same result as the order_by_clause used to order the overall query or subquery.

An analytic function that implements the RANGE keyword can use multiple sorting keys in the order by clause if it mentions any of the mentioned below windows:

RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW: The short form used for this window is RANGE UNBOUNDED PRECEDING.
RANGE BETWEEN CURRENT ROW AND CURRENT ROW
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

These four windows can only have multiple sorting keys in the order_by_clause. All the other boundaries can only have one sort key in the ORDER BY clause of the analytic function. This limitation does not apply to the window boundaries determined by the ROW keyword.

ASC | DESC

These keywords are used to specify the ordering sequence. The order can either be ascending or descending. By default, the ordering sequence is set to ASC.

NULLS FIRST | NULLS LAST

It determines whether returned rows with null values should appear in the ordering sequence at the beginning or last. While the NULLS LAST is set as the default value when in the ascending order, and the NULLS FIRST is the default value for the descending order.

Analytical functions are only operated on the rows in the order determined by the order_by_clause of the function. Since the order_by_clause of the function does not ensure the order of the result produced. The user can implement the order_by_clause of the query to ensure that the final result is ordered.

Windowing_clause

Some analytic functions enable the windowing_clause. Some of the analytical function that allows the user to use the windowing_clause are as follows:

COUNT
CORR
FIRST_VALUE
LAST_VALUE
AVG

ROWS | RANGE

These keywords are defined for each row in the window. It can be both a physical or logical set of rows that can be used to compute the result for the function. The function can then be implemented in all the rows in the window. The window specified moves through the result query set or partition from top to bottom.
ROWS determines the window using the physical units (rows), while the RANGE keywords determine the window as a logical offset.
If the user wants to implement this clause, the user must specify the order_by_clause. Some of the window's boundaries determined by the RANGE clause enable the user to mention only one expression in the order_by_clause.
The value the analytical function returns has a logical offset, as the windows are always deterministic. Whereas, the value that is returned by the analytic function with the rows as the physical offset for the windows may result in non-deterministic results. In this scenario, the results can only be deterministic if the ordering expression produces a unique ordering for each row in the window. The user may have to specify several columns in the order_by_clause to produce a unique order for the rows.

BETWEEN ... AND

These clauses are used to determine a particular starting point and the ending point for the window. The first expression entered before AND specifies the beginning point for the window, and the second expression entered after the AND specifies the ending point for the window.

Suppose the user does not use BETWEEN keyword and only mentions the ending point for the window. Then the SQL will treat the point as the beginning point, and the ending point of the window will be set as the current row.

UNBOUNDED PRECEDING

The user uses the UNBOUNDED PRECEDING to specify that the window begins with the first row of the partition. This can only be used as the beginning specification and not for the ending point specification.

UNBOUNDED FOLLOWING

The user uses the UNBOUNDED FOLLOWING to specify that the window ends with the last row of the partition. This can only be used as the ending specification and not as the beginning point specification.

CURRENT ROW

When the user uses the CURRENT ROW as a starting point, then it specifies that the window will start from the current row or value. This will also depend on whether the user has mentioned the ROW or RANGE. If specified, the window's ending point cannot be the value_expr PRECEDING.

When the user uses the CURRENT ROW as an endpoint, then it specifies that the window will end from the current row or value. This will also depend on whether the user has mentioned the ROW or RANGE. If specified, the window's beginning point cannot be the value_expr FOLLOWING.

value_expr PRECEDING or value_expr FOLLOWING

For RANGE or ROW:

If the value_expr FOLLOWING is used as the beginning point, then the ending point will be assigned the value in the value_expr FOLLOWING.
If the value_expr PRECEDING is used as the ending point, then the beginning point will be assigned the value in the value_expr PRECEDING.
If the user uses time interval in the numeric format to define the logical window, then the user may require to perform the given conversion functions.
If the user specifies the ROWS, then the value_expr is the physical offset for the window. The parameter is a constant or expression that can be converted into a positive numeric value.
If the user specifies the RANGE, then the value_expr is the logical offset for the window. The parameter is a constant or expression that can be converted into a positive numeric value or a literal interval.
The user can only mention one expression in the order_by_clause. This means that either the value can either be numeric or interval value.
If value_expr contains a numeric value, then the expression used for the ORDER BY must be a numeric or DATE data type.
If value_expr contains an interval value, then the ORDER BY expr must be a DATE data type.
If the user does not want to include the windowing_clause, then the default value is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

Some Common Analytical Functions

Some of the analytical functions and the functions performed by them are as follows:

AVG: This function returns the average value of the expression passed in the function. The functions take any numeric data type as the input parameter. The user can also input a non-numeric datatype that can be implicitly converted into a numerical value. The data type of the value returned by the function is the same as the data type of the input parameter.
COUNT: This analytic function returns the number of rows the query returns. This function can be used as an aggregate or an analytical function.
- If the user can specify DISTINCT, then the user can only use
- The query partition clause of the analytical function. This means the user cannot use this function's order by clause or the windowing clause.
- If the user adds an expression as the function parameter, then the COUNT analytic function returns the number of rows where the value of the particular expression is not null. The user can either count all the rows or rows with a distinct value for the expression.
CUME_DIST: This analytical function computes the cumulative distribution for a particular value in a group of values. The value range returned by this function is greater than 0 or less than equal to 1. If the user computes the cumulative distribution for the same value, then the function will return the same cumulative distribution value.
- This function can have any numeric or non-numeric data type that can be implicitly changed to a numeric one. The DBMS is responsible for determining the argument with the highest precedence and performing the computation accordingly. It converts the arguments to the required data type for the function and returns the numerical value.
PERCENT_RANK: It is similar to the CUME_DIST (cumulative distribution) function. It returns a value ranging from 0 to 1, with 0 and 1 included. The first row in any set has a PERCENT_RANK of 0. The functions return a numerical value as a result.
NTH_VALUE: This function calculates the measure_expr value for the nth row in the window. The row must be included in the window defined in the analytic clause. The return value has the same data type as the data type for the measure_expr.
- {RESPECT | IGNORE} NULLS: It determines whether measure_expr with null values will be included or excluded while computing the analytical function. By default, the null values are considered during the computation; the value is set at RESPECT NULLS.
- In this function, n represents the nth row for which the function will compute the value. N can be a constant value, bind variable, column, or expression containing either of these values. The only requisitive is that the resulting integer has a positive value. It must contain n rows to compute the result. The function will return a NULL value if the source window has less than n rows. If the value is, the NULL function returns an error.
- FROM {FIRST | LAST}: It tells the system whether the computation starts from the beginning or end row of the window. By default, the computation begins from the first row in the window.
MIN: This analytical function will return the least value in the expr. The function can be used as the aggregate or the analytic function.
MAX: This analytical function will return the highest least value in the expr. The function can be used as the aggregate or the analytic function.