The ROLLUP in MySQL is a modifier used to produce the summary output, including extra rows that represent super-aggregate (higher-level) summary operations. It enables us to sum-up the output at multiple levels of analysis using a single query. It is mainly used to provide support for OLAP (Online Analytical Processing) operations.
The ROLLUP modifier can only be used with the GROUP BY query in MySQL.
The following are the syntax to use the ROLLUP modifier:
In this syntax, we need to specify the column names displayed in the query result in the SELECT clause. Next, we will mention the table name. After that, we have specified the GROUP BY clause, including the column names based on which we want to aggregate data. Finally, we specify the WITH ROLLUP modifier to get the super-aggregate output in an additional row.
We have already learned that the GROUP BY query is applied with aggregate functions such as MAX, MIN, SUM, COUNT, AVG, etc. that groups the output rows by single or multiple columns. The ROLLUP modifier is an option to use GROUP BY query that includes extra fields for representing the subtotals. These additional rows are called super-aggregate rows, which is the combination of the grand-total rows. Thus, the ROLLUP modifier allows us to create multiple grouping of set rows within a single query based on the columns specified in the GROUP BY clause in MySQL.
MySQL ROLLUP Explanation
If we want to understand the ROLLUP modifier, we must have to know what is grouping set. A grouping set is a set of columns that we want to group to get the resultant output. For example, suppose we have a table "sales" that contains the following data:
If we want to summarize the result per year, we will use the simple GROUP BY clause as below:
It will give the below output that shows the total (aggregate) sale for each year:
In the above query, the grouping set is denoted by the column name Year. If we need to generate more than one grouping sets together in a single query, we may use the UNION ALL operator as follows:
In this query, we can see the NULL column. It is because the UNION ALL clause requires all queries to have the same number of columns. So to fulfill this requirement, we have added NULL in the select list of the second query.
When we execute the query, we get the following output:
The NULL in the output of the Year column represents the grand-total super-aggregate value. Since this query is able to generate the total sales in each year and also grand-total aggregate sales, however, it has two problems:
To solve these issues, MySQL allows us to use the ROLLUP clause, which provides both levels of analysis in one query. The ROLLUP clause is an extension of the GROUP BY clause that produces another row and shows the grand-total (super-aggregate) values.
Let us see the result after adding a WITH ROLLUP modifier to the GROUP BY clause that shows the grand total over all year values:
When we execute the command, we will get the output as follows:
In this output, we can see the NULL value in the Year column that identifies the supper-aggregate line. It clearly shows that the ROLLUP clause not only generates the subtotals but also gives the grand total of the total sales in all year.
If the GROUP BY clause has more than one column, the ROLLUP modifier has a more complex effect. In this case, the ROLLUP modifier assumes a hierarchy among the columns specified in the GROUP BY clause. Each time there is a change in column value, the query generates an extra super-aggregate summary row at the end of the result.
For example, assumes we have specified the three columns in the GROUP BY clause as below:
The ROLLUP modifier assumes the hierarchy as below:
And generates the following grouping set:
See the below query to explain it more clearly:
Without ROLLUP, a summary of the sales table based on multiple columns specified in the GROUP BY clause look like the below output. Here we will get the summary values only at the Year/Country/Product level of analysis.
With ROLLUP added, the query produces several extra rows:
See the below output:
The above output generates information at four levels of analysis, which are explained below:
If we change the order of the columns specified in the GROUP BY column, we will get a different result:
See the following output:
The GROUPING() function is used to check whether NULL in the result set represents the regular grouped value, a super-aggregate value, or grand totals. When NULL occurs in a supper-aggregate row, it returns one. Otherwise, it returns 0.
We can use the GROUPING() function in the select list, HAVING clause, and ORDER BY clause.
See the below query:
We will get the following output where the GROUPING(Year) returns one when NULL in the Year column occurs in a super-aggregate row. Otherwise, it will return zero.
Similarly, the GROUPING(Country) returns one when NULL in the Country column occurs in a super-aggregate row. Otherwise, it will return zero.
Also, the GROUPING(Product) returns one when NULL in the Product column occurs in a super-aggregate row. Otherwise, it will return zero.
We can also use the GROUPING() function to substitute the meaningful labels for super-aggregate NULL values instead of displaying it directly.
The below query explains how to combine the IF() function with the GROUPING() function to substitute labels for the super-aggregate NULL values in Year, Country, and Product columns:
We will get the output as below:
If we have multiple arguments in the GROUPING() function, it will return output representing a bitmask that combines the results for each expression. Here, the lowest-order bit produces the result for the right-most argument. The below example will be evaluated like this:
Example: GROUPING (Year, Country, Product)
If any arguments have a super-aggregate NULL value, the result of such GROUPING() is non-zero. In this case, it will return only the super-aggregate rows and filter the regular grouped rows using the following query:
It will give the below output: